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Energetic concerns 


Rewarding existing nuclear power plants for the value of their low-carbon power makes sense, but 
the nuclear industry has a lot of work to do ifit is survive and thrive in the twenty-first century. 


hen the state of New York moved last December to require 
Wii: companies to provide 50% of their power through 

renewable sources by 2030, questions about nuclear power 
naturally arose. Six nuclear reactors at four facilities currently provide 
more than 30% of the state’s electricity — and more than half of its 
low-carbon source. Four of those plants were at risk of closure owing 
to simple economics: they have not been able to compete with cheap 
natural gas. 

After factoring in the climatic value of low-carbon power gener- 
ated at these stations, however, state regulators created a new subsidy 
on 1 August. The state began with the ‘social cost of carbon, which 
represents the damage caused by greenhouse-gas emissions. The US 
government's central estimate is currently US$38 per tonne of carbon 
dioxide, rising to $50 in 2030. Revenues were well below that, so these 
plants will now be eligible for a ‘zero-emissions credit’ designed to 
make up the shortfall. In the first 2 years alone, that subsidy could be 
roughly $965 million. Illinois-based Exelon Corporation, which owns 
two of the facilities and is in negotiations to purchase the third, said it 
would press forward with its plan to keep the plants running. 

The first lesson is that the price of carbon matters. New York is one 
of nine eastern states participating in an emissions trading system. 
The current price — averaging around $4 per tonne of CO, — was 
not high enough to keep nuclear power competitive with natural gas. 

The US nuclear industry, and some pro-nuclear environmentalists, 


have hailed the New York standard as a precedent, and rightly so. 
It’s a potential model for other US states in which nuclear power is 
facing similar economic hurdles. More generally, it’s yet another 
reminder that climate policies have a long way to go, despite the 
rhetoric enshrined in the Paris climate agreement last year. 

The nuclear industry’s woes don’t end there, however. Roughly 
440 nuclear power plants currently provide 11% of the world’s elec- 
tricity, but they are on average 30 years old. More than 60 reactors are 
under construction, but the industry must work just to maintain its 
share of the energy mix as older plants close in the coming decades. 

Simultaneously, New York state is opposing efforts to extend the 
lives of two other reactors at the Indian Point Energy Center on safety 
grounds. The operator has been fending off questions about tritium 
contamination in groundwater and various equipment malfunctions 
while applying for a permit from the US Nuclear Regulatory Commis- 
sion to extend the life of the reactors from 40 to 60 years. 

As long as nuclear power plants can demonstrate that they can oper- 
ate safely, their contribution to the global effort to reduce greenhouse 
gases should be encouraged. But the reality is that there may be places 
where governments — and communities — decide that the potential 
price of a nuclear accident is too high. Whether the industry can expand 
in any meaningful way may depend on a new — and as yet unproven 
— generation of accident-proof reactors. Despite its efforts to keep a 
few reactors alive for now, New York is clearly betting on renewables. = 


CERN’s road bump 


The disappearing LHC signal is disappointing 
for those pitching for the next big accelerator. 


this week. As the high-energy-physics community gathered in 
Chicago on Friday, hopes were high (if cautious) that the Large 
Hadron Collider (LHC) at CERN, Europe’s particle-physics labora- 
tory near Geneva, Switzerland, had chalked up another finding to 
build on the discovery of the Higgs boson. Not so — the bump in the 
data that had caused such excitement was washed away with a flood 
of data that revealed it to be a mere statistical fluctuation. 
Ordinarily, physicists would be satisfied if the LHC continued its 
bread-and-butter existence of confirming with ever-greater precision 
the standard model — a remarkably successful theory that is known to 
be incomplete. But the excitement over the bump has left them hungry 
for more. As is evident from the 500 theory papers written about the 
bump, physics is ready for something new. 


cience thrives on discovery, so it’s natural for physicists to mourn 


That the LHC has not turned up anything beyond the standard model 
does not mean it never will. The machine has collected just one-tenth of 
the data that scientists hoped to amass by the end of 2022, and just 1% of 
those it could collect ifa planned revamp to increase the intensity of col- 
lisions goes ahead. But the dry spell worries some. The idea of supersym- 
metry predicts that heavier counterparts to regular particles will become 
evident at higher collision energies. Before the LHC was switched on, 
fans of the theory would have gambled on being able to see something 
by now. And if the dry spell extends to a drought, high-energy physics 
could descend into what some call the nightmare scenario — the collider 
finds nothing beyond the Higgs boson. Without ‘new’ physics, there is 
no thread to pull to unravel the countless mysteries that the standard 
model fails to account for, including dark matter and gravity. 

There remain strong reasons to build a successor machine. But with- 
out another discovery, the public’s delight in high-energy physics could 
fade: there comes a time when exploration alone no longer satisfies. 

Convincing funding agencies to cough up several billion dollars to 
continue the same approach will therefore be tough, especially when 
neutrino and lab-based precision experiments cost a fraction of the 
price. It will be physicists’ job to consider carefully the worth of pursuing 
that discovery strategy. And if high-energy colliders remain essential, 
they need to work on their sales pitch. m 
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JOHN BARUCH 


WORLD VIEW jennsnicos sen 


s the popularity of self-driving cars increases, so do the con- 
cerns. Last month, China banned tests of autonomous vehi- 


cles on public roads. And investigations continue into the 
death of Joshua Brown, who was killed when his Tesla car on autopilot 
ploughed into the side of an articulated truck in Florida. His car used 
visible-light cameras to image the road and computers to evaluate the 
situation. But according to Tesla, the white truck merged with the 
bright Florida sky and was not recognized. 

Self-driving vehicles promise to extend and enrich travel. But 
they raise profound questions for society about our relationship 
with machines. How will people cope, and what business models 
will develop? 

Will car manufacturers emulate jet-engine manufacturers such 
as Rolls-Royce and Pratt & Whitney and start to sell travel distance 
and continuously record the operation of every 
engine they sell? Will car owners use the Uber 
model to rent out their vehicles as taxis instead 
of leaving them in a car park? Will automation 
revolutionize rural transport and give the rural 
poor, young, old and disabled the low-cost travel 
they are entitled to? 

The challenges for the architects of our city 
centres, housing, streets, schools and workplaces 
are immense. And the issues around control of 
information such as big data and privacy must 
be confronted. Car manufacturers may find that 
the information they glean from tracking the 
lifestyle of their customers is worth much more 
than their vehicles. 

The scientific community and society in gen- 
eral need to engage with these questions, and together decide what 
kind of future we want and how autonomous vehicles fit in. If we 
don't, the future is likely to be mapped out by companies that merely 
want to make money from the technology. The questions are complex, 
but they can be boiled down to one: should self-driving cars have a 
steering wheel? 

The big Internet companies such as Google, Apple and Baidu — those 
generating the real pressure for self-driving vehicles — do not think that 
they should. These companies are keen to maximize time online for 
those who are rich enough to afford a car. Google's business model can 
use daily commute time that is no longer spent driving a car to increase 
the value of its advertising. It would therefore not want self-driving vehi- 
cles that allow the driver to take over. The vehicle is totally autonomous. 

A number of driverless vehicles with no steering wheel and no 
opportunity for people to take control are on trial — including a 
parking transit system at London Heathrow Airport and buses in the 
Netherlands, Italy and China. It is the Chinese who are taking the 
opportunity most seriously, where Internet companies are working 
with vehicle manufacturers. 


INCREMENTAL 


SUPPORT 
NOT 


A SAFE 


COMPROMISE. 


Steer driverless cars 
towards full automation 


For cars to be safe, full control must be allocated to the driver — be it human or 
computer, argues John Baruch. 


But this model poses a problem for Tesla and for many other 
car manufacturers, especially for the more expensive brands. The 
appeal to customers of luxury car brands tends to be the driv- 
ing experience. And if the car has no steering wheel — and the 
‘driver’ is a mere passenger — that appeal evaporates. That's why 
Tesla, Jaguar Land Rover and others use the technology in existing 
self-driving cars only to provide support, with the driver officially 
remaining in charge. Formally, Brown was in charge of the vehicle 
he died in. 

Car and component companies are working hard to generate a 
business model for more-autonomous vehicles that have steering 
wheels. It is clear that they remain keen to provide the driver with 
assistance, and that means there is a real need for research: on both 
the technical and social-science aspects. How will drivers react 
when the car tells them to take over? How can 
the handover be made safe? How can the vehicle 
be brought to a safe halt if the driver does not 
take charge? 

I operate an autonomous robotic telescope 
in the Canary Islands that is 3,000 kilometres 
from its base in the United Kingdom. Autono- 
mous telescopes do not pose the same dangers 
to the public as self-driving vehicles do, but 
there is a lot that can be learnt from our experi- 
ences. We have removed nearly all the single- 
point failure modes by quadruplexing all the 
crucial information flows (when one fails, you 
can still poll the others and isolate the failure) 
and have instituted an artificial-intelligence 
reconfiguration process that isolates failure 
until it is repaired. With quadruplexed systems, Brown might not 
have died. The car would have slowed down, if only because the 
sensor systems were experiencing confusion. 

The driver-support philosophy is a flawed approach. Some air- 
craft already have technology that can take the plane from runway 
to runway, with the pilot just taxiing the plane to and from the 
stand. It is generally not used, because, with no role in the flight, 
pilots can become bored and do other things. They are then totally 
unprepared to take over if needed. Incremental support is not a 
safe compromise. People must either drive a car or be driven by it. 

The efforts of the luxury car brands are reminiscent of when gas 
companies tried to improve lighting by adjusting lantern mantles 
when electric lighting appeared. For self-driving cars to rule the 
road, the steering wheel must go the way of the Model T Ford. = 


John Baruch is a senior lecturer at the University of Bradford, UK, 
and visiting professor at South China University of Technology in 
Guangzhou. 

e-mail: j.e.f. baruch@bradford.ac.uk 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


Minerals mimic 
synthetic structure 


Researchers have found 
naturally occurring metal- 
organic frameworks (MOFs) 
— chemical structures that 
were thought to exist only 
when made in the lab. 

MOFs have open, porous 
architectures, which could 
make them useful in catalysis, 
photovoltaics and other 
applications. Tomislav 
Frisci¢ at McGill University 
in Montreal, Canada, 

Sergey Krivovichev at Saint 
Petersburg State University 

in Russia and their colleagues 
used X-ray diffraction to 
study two samples from a 
permafrost drill core, which 
was taken from a Siberian coal 
mine 230 metres below Earth's 
surface more than 70 years ago. 
They observed that the rare 
organic minerals stepanovite 
and zhemchuzhnikovite 
contain channels, pores and 
other structures that are found 
in synthetic MOFs. 

These are the only organic 
minerals known so far to 
have open architectures, the 
authors say. 

Sci. Adv. 2,e1600621 (2016) 


URBAN ECOLOGY 


Insect mix high in 
rich areas 


The interiors of homes in 
affluent neighbourhoods host 
a wider diversity of insects and 
spiders than do those in less 
wealthy areas. 
Neighbourhoods with a 
high income often have a 
higher diversity of plants and 


Gentle birth of a comet 


The comet 67P/Churyumov-Gerasimenko (pictured), which 
has been orbited by the Rosetta spacecraft since 2014, might 
date back to the primordial Solar System billions of years ago. 
A team led by Bjérn Davidsson at NASA’ Jet Propulsion 
Laboratory in Pasadena, California, used instruments on the 
European Space Agency's spacecraft to examine the structure 
of the comet’s core. The porous consistency of 67P shows 
that it did not form through violent collisions. Instead, the 
authors propose that the comet was made gradually, when 
icy pebbles from the outer reaches of the developing Solar 
System clumped together. The two lobes of 67P may have 
gently joined together during the final stages of the comet's 


formation. 


Astron. Astrophys. 592, A63 (2016) 


certain animals, such as birds, 
than other areas. To find out 
whether this ‘luxury effect’ 
extends indoors, Misha Leong 
at the California Academy of 
Sciences in San Francisco and 
her colleagues sampled all 
arthropods — living and dead 
— including insects (pictured 
is Sciara hemerobioides), 
spiders and millipedes, inside 
50 homes in and around 
Raleigh, North Carolina. They 
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found that arthropod diversity 
increased with house size 

and diversity of surrounding 
vegetation, and were surprised 
to find a strong influence 

of average neighbourhood 
income, too. 

Affluence could be 
affecting arthropod diversity 
through urban planning 
and landscaping at the 
neighbourhood level. 

Biol. Lett. 12, 20160322 (2016) 


PALAEOECOLOGY 


Thirst finished off 
the mammoths 


One of the last woolly 
mammoth populations died 
out on an island off the coast 
of Alaska nearly 6,000 years 
ago, probably because of a 
shrinking supply of fresh water. 
Human hunting has been 
linked to the extinction of 
the species (Mammuthus 
primigenius), but this relict 
population perished without 
our help, according to Russell 
Graham of Pennsylvania State 
University in University Park 
and his colleagues. The authors 
examined ancient DNA, 
isotopes and plant and animal 
material in sediment cores 
from a lake on St Paul Island. 
They also studied mammoth 
fossils. The researchers estimate 
that the island’s mammoths 
became extinct 5,600 years ago, 
when the island was shrinking 
because of sea-level rise and 
the lake was evaporating into 
a salty puddle — perhaps 
because of long-standing 
drought, or depletion by the 
mammoths themselves. 
Freshwater scarcity could 
drive island extinctions more 
often than previously thought, 
the authors say — and will only 
increase as the climate changes. 
Proc. Natl Acad. Sci. USA 
http://doi.org/bm9z (2016) 


No sign of new 
neutrino 


A massive detector at the South 
Pole has found no evidence of 
a ‘sterile’ neutrino: a near- 
massless particle that is thought 
to interact only through gravity. 
Hints of this possible 
fourth type of neutrino first 
emerged in the 1990s, and 
were rekindled early this year 
by an experiment in China. 
In the latest work, researchers 
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at the IceCube Neutrino 
Observatory in Antarctica, 
led by Francis Halzen at the 
University of Wisconsin- 
Madison, counted neutrinos 
of a known type that hit the 
detector from below. A dearth 
of these neutrinos at particular 
energies would have revealed 
that some of the particles had 
temporarily mutated into 
sterile neutrinos during their 
trip through Earth, but the 
researchers found no such 
feature in their data. 

The experiment did not 
rule out the existence of 
heavier sterile neutrinos. 

A fourth kind of neutrino 
would challenge the standard 
model of particle physics, 
which allows for only three 
neutrino types. 

Phys. Rev. Lett. 117,071801 (2016) 


Ancient whales 
heard high notes 


Fossil evidence suggests that 
the first whales could detect 
high-frequency sounds. 
Researchers have 
debated whether animals 
called archaeocetes — the 
common ancestors of 
all modern whales and 
dolphins — specialized in 
hearing high frequencies, 
like modern killer whales, or 
low frequencies, like today’s 
humpback whales. Morgan 
Churchill at the New York 
Institute of Technology in Old 
Westbury and his colleagues 
describe a new species of 
whale (fossil skull pictured) 
dating from 27 million to 
24 million years ago. Features 
of its remarkably well- 
preserved inner ear, as well as 
other structures, suggest that 
the animal could generate 


and hear high-frequency 
sounds. The inner ear also has 
primitive features similar to 
those of archaeocetes. 

The authors suggest that 
the first whales could hear 
higher frequencies than 
their terrestrial ancestors 
— an ability co-opted by 
later toothed whales for 
echolocation. 

Curr. Biol. http://doi.org/bnh5 
(2016) 


Immune cells tire 
out in tumours 


After they invade tumours, 

immune cells gradually lose 

their ability to produce energy. 
Greg Delgoffe and his 

colleagues at the University 

of Pittsburgh in Pennsylvania 

studied immune cells called 

T cells in mice with implanted 

tumours. They found that 

T cells inside tumours 

were less effective at taking 

up glucose than those in 

other parts of the body. The 

tumour-infiltrating cells 

also showed reduced total 

mass of mitochondria — cell 

organelles that produce energy 

— and contained abnormally 

shaped mitochondria. The 

metabolic defects were linked 

to reduced levels of PGC1a, 

a protein that regulates 

mitochondrial replication 

during cell division. When 

the researchers used a virus 

to boost PGC1a expression 

in T cells and gave the cells 

to tumour-bearing mice, 


RESEARCH HIGHLIGHTS BiiiSaiiaa¢ 


the tumours shrank more 


and the animals lived longer 
than those that received non- 
reprogrammed cells. 

Boosting metabolic 
processes in immune cells 
could help to improve cancer 
therapies, the authors say. 
Immunity http://doi.org/bndn 
(2016) 


PHYSICS 


Crack patterns in 
freezing water 


Water droplets landing ona 
cold surface fragment into one 
of two different patterns as 
they freeze, depending on the 
temperature of the surface. 
Elisabeth Ghabache and her 
colleagues at the University 
of Pierre and Marie Curie 
in Paris used a high-speed 
camera to monitor the 
behaviour of pancake-shaped 
water droplets that froze ona 
cold steel surface after being 
dropped from a height of 
36 centimetres. They observed 
no crack formation when the 
surface was at -20°C (pictured 
left). At -30°C and -40°C, 
cracks spread from a central 
point towards the ‘pancake’ 
edge (centre). At -50°C and 
-60°C, the cracking occurred 
ina step-by-step manner, 
with the initial cracks splitting 
into newer ones at roughly 
90-degree angles (right). The 
team used fracture modelling 
to determine the transition 
temperatures between the 
different fragmentation 
regimes. 
Fragmentation occurs in 

many physical processes, 
such as bubble 

bursting and glass 

breaking. This 
model system 


could help researchers to learn 
more about various fracture 
mechanisms, the authors say. 
Phys. Rev. Lett. http://dx.doi.org/ 
10.1103/physrevlett.117.074501 
(2016) 


MICROBIOLOGY 


Toxic bacteria 
adapt fast 


Harmful blue-green algae 
can adapt rapidly to changing 
environments. 

The photosynthetic 
cyanobacterium Microcystis 
produces toxic blooms in 
lakes and reservoirs. To test 
how different strains respond 
to changing carbon dioxide 
levels in water, Jef Huisman 
and his colleagues at the 
University of Amsterdam 
kept mixed populations in the 
laboratory and aerated the 
water with bubbles containing 
low or elevated levels of 
CO,. In low CO, conditions, 
strains whose carbon-uptake 
systems are efficient when 
carbon is limited became 
dominant. When CO, was 
elevated, however, strains 
that have systems with high 
uptake rates outcompeted 
the others. The team studied 
Microcystis collected from 
Lake Kennemermeer in the 
Netherlands and found that 
the abundance of each strain 
shifted with seasonal changes 
in CO, availability. 

Cyanobacteria may be more 
adept at dealing with high CO, 
levels than previously thought. 
Proc. Natl Acad. Sci. USA 
http://doi.org/bnf9 (2016) 
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SEVEN DA 


PEOPLE 


Femtochemist dies 
Ahmed Zewail, the winner 

of the 1999 Nobel Prize in 
Chemistry, died on 2 August, 
aged 70. He is credited 

with founding the field of 
femtochemistry, which 
probes the mechanics of 
chemical reactions using 
laser pulses lasting just tens of 
femtoseconds (1 femtosecond 
is 10°’ s). An Egyptian- 

born US citizen working at 
the California Institute of 
Technology in Pasadena, 
Zewail was the first Arab 

to win a science Nobel. 

He championed science 
education and research in his 
native country and founded 
the Zewail City of Science and 
Technology, a university that 
opened its doors to students 
in 2012 near Cairo. 


} RESEARCH 
Cancer drug fails 


A promising lung-cancer 
drug failed in a key clinical 
trial, sending stock in the 
drug’s developer, Bristol- 
Myers Squibb, tumbling by 
17% on 5 August. The drug, 
called Opdivo (nivolumab), 

is one of a suite of new cancer 
therapies that release immune 
responses against tumours 

by blocking a protein called 
PD-1. Opdivo has been shown 
to benefit some people with 
advanced cancers, including 
lung cancer. But Bristol-Myers 
Squibb, which is based in New 
York City, announced last 
week that Opdivo had failed 
as a front-line therapy for lung 
cancer. Stock in a competing 
firm, Merck of Kenilworth, 
New Jersey, rose 10% after the 
news. 


Turkish purge 

The Turkish research agency 
TUBITAK in Ankara has 
removed 139 staff from their 
posts, pending investigations 


The news in brief 


Iran executes scientist 


Shahram Amiri, an Iranian nuclear scientist, has been hanged 
for espionage, his country’s officials announced on 7 August. 
Amiri said that he had been abducted by the CIA during a 
pilgrimage to Saudi Arabia in 2009, and taken to the United 
States to be interrogated and tortured. But the US government 
denied that, and US media alleged that he had been a paid 
informant who defected voluntarily. He returned to Iran in 
2010 and was later convicted of passing secrets about Iran’s 
nuclear activities to the United States. Iran has maintained 
that its nuclear programme has only peaceful purposes. Amiri 
was reportedly an isotope researcher at the Malek-Ashtar 
University of Technology in Tehran. 


into their possible connections 
with the Giilen movement, 
which President Recep Tayyip 
Erdogan claims was behind 
the country’s attempted coup 
last month. A further 28 staff 
have resigned, according 

to statements by Turkish 
science minister Faruk Ozlii 
on 5 August. TUBITAK, 
which helps to design the 
country’s research policy and 
distributes grant money, has 
previously been purged. In 
2014, agency engineers were 
dismissed after they declared 
that incriminating recordings 
of allegedly tapped telephone 
conversations between 
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Erdogan and his son — which 
Erdogan said were fabricated 
— were not manipulated. 


Mosquito trial 

The US Food and Drug 
Administration announced 
on 5 August that a proposed 
Florida field test of genetically 
modified mosquitoes poses 
few risks to the environment 
or human health. The 
announcement clears the way 
for the board of the Florida 
Keys Mosquito Control 
District to allow the release 

of Aedes aegypti mosquitoes 
carrying a gene that kills 

the insects’ offspring, in an 


effort to control diseases they 
transmit, including dengue and 
Zika viruses. Local residents 
will vote in a non-binding 
referendum, currently slated 
for November, before the board 
decides whether to go ahead 
with the trial. 


EVENTS 


Missing vaccines 

An investigation by the 
Associated Press has reported 
that 1 million yellow-fever 
vaccines sent in February by 
the World Health Organization 
and its partners to tackle a large 
outbreak in Angola cannot 

be accounted for. Six million 
vaccines were sent in total. 

Of those that can be traced, 
some were sent to regions 
where there was no yellow 
fever; others were improperly 
stored, or arrived without the 
syringes to administer them, 
according to the 5 August 
report. The agencies involved 
have responded that a wastage 
of around 10% is expected in 
mass-vaccination campaigns 
for yellow fever — and Angolan 
officials have denied that any 
vaccines went missing. 


Goodnight Yutu 
China’s moon rover Yutu, or 
Jade Rabbit, was officially 
declared dead by state officials 
on 3 August. It arrived on 

the Moon in December 

2013, and was intended to 
carry out a three-month 
exploration of the lunar 
surface, but it survived for 
more than two years before 
going dark for the last time. 
The six-wheeled, solar- 
powered rover was struck 

by mechanical difficulties in 
early 2014, but had already 
used its penetrating radar 

to probe the structure of the 
lunar soil to a depth of more 
than 100 metres, and sent 
back data and high-resolution 
images to Earth. The mission 
made China the third country 


RAHEB HOMAVANDI/REUTERS 
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COURTESY OF MOON EXPRESS 


to land acraft on the Moon, 
after the United States and the 
Soviet Union. 


APS moves meeting 
A division of the American 
Physical Society (APS) has 
voted to move its 2018 annual 
meeting from its originally 
planned location of Charlotte, 
North Carolina, as a result ofa 
state law, enacted in March, that 
forces transgender people to 
use only toilets that correspond 
to their sex at birth. The chair 
of the APS Division of Atomic, 
Molecular, and Optical Physics 
said that it wanted to “provide 

a welcoming environment 

for all members” The APS, 
which made the statement 

on 4 August, will hold its 
conference in Florida instead. 


Private Moonshot 


A private mission to the Moon 
has been approved for the first 
time, by the US government. 
Moon Express, a company 

in Cape Canaveral, Florida, 
announced on 3 August that 
it had been given permission 
to travel beyond Earth’s orbit 
and puta robotic lander on 
the Moon in 2017. Moon 
Express was founded in 

2010 by three technology 
entrepreneurs, and is one of 
the companies competing for 
the Google Lunar XPRIZE; 
the competition will award 
US$20 million to the first 
company to land a privately 
funded spacecraft on the 


TREND WATCH 


Zebrafish (Danio rerio) are the 
rising stars of model-organism 
research, an analysis of grants 


Moon. Moon Express is yet to 
finish building its MX-1 lander 
(pictured, artist's impression). 


Zikain Cuba 


Cuba has discovered two cases 
of locally acquired Zika- 

virus infection, the nation’s 
health ministry announced 

on 4 August. The country’s 
preventive campaigns had. 
been largely successful in 
staving off infections, with only 
30 imported cases identified 
in Cuba this year, and only one 
previous locally transmitted 
case, in March. On 3 August, 
the US National Institutes of 
Health (NIH) launched the 
first clinical trial ofa Zika 
vaccine, which it plans to test 
in 80 healthy volunteers. NIH 
officials say that the vaccine 
will probably not be ready for 
deployment until 2018. 


Data push-back 


A coalition of researchers has 
rebuffed a proposal to share 
clinical-trial data rapidly. In 


February, the International 
Committee of Medical Journal 
Editors (ICMJE) proposed 
that clinical-trial leaders 
should make the de-identified 
patient data that underlie a 
journal article public within 
six months of publication. 

The 282 signatories to an 
article published on 4 August 
in The New England Journal 

of Medicine said that the 
ICMJE proposal was too 
burdensome and would have 
unintended consequences, 
such as delaying publication 
of results (N. Engl. J. Med. 375, 
405-407; 2016). They said 
that researchers should have at 
least two years — but up to five 
— to make data public. 


US climate rule 
The White House released a 
sweeping new climate policy 
on 2 August, instructing all 
federal agencies, from the 
Department of Agriculture 
to the Department of 
Transportation, to consider 
the impacts of their actions 
on climate change from now 


ZEBRAFISH COURT FUNDING DOLLARS 


Grants for zebrafish (Danio rerio) research from the US National 
Institutes of Health’s RO1 award programme are on the rise. 


e== Drosophila eC. elegans «=D. rerio === Xenopus 


from the US National Institutes of 
Health (NIH) finds. A team at the 
NIH Office of Portfolio Analysis 
used text mining and manual 
searching to assess successful 
applications for RO1 awards, the 
NIH’s main grant programme for 
individual investigators. Whereas 


the proportions of grants allocated 


to zebrafish and Caenorhabditis 
elegans studies rose between 2008 


and 2015, the fraction for research 


with Xenopus frogs fell. 
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SEVEN DAYS | THIS WEEK | 


OMING UP 


21-25 AUGUST 
Scientists come together 
in Philadelphia, 
Pennsylvania, for the 
American Chemical 
Society's national 
meeting and exposition, 
where topics will 
include the science 
behind Pixar. 
go.nature.com/2ayecj7 


20-30 AUGUST 

Kuala Lumpur hosts 

the 34th biennial 
conference of the 
Scientific Committee on 
Antarctic Research. 
www.scar2016.com 


on. The agencies are also 
required to quantify those 
impacts, mainly in terms of 
greenhouse-gas emissions. 
The policy comes from the 
White House Council on 
Environmental Quality, which 
was established in 1969 to 
advise agencies when they are 
preparing environmental- 
impact statements. 


} FUNDING 
Climate-cut U-turn 


Australia’s government has 
ordered its national science 
agency to re-prioritize 

basic climate research, six 
months after the organization 
unveiled controversial plans 
to slash jobs in the sector. The 
Commonwealth Scientific 
and Industrial Research 
Organisation (CSIRO) 

will — on government 
instructions — create 15 new 
climate-science jobs and 
receive an extra Aus$37 million 
(US$28 million) over the next 
10 years, science minister 
Greg Hunt announced on 

4 August. But the intervention 
may have come too late to 
repair damage already caused, 
researchers say. See go.nature. 
com/2akgeyp for more. 


> NATURE.COM 
For daily news updates see: 
www.nature.com/news 
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The CMS (pictured) was one of two experiments at the Large Hadron Collider that saw hints of an unexpected particle. 


| PARTICLE PHYSICS | PHYSICS 


LHC particle hopes dashed 


Promising two-photon signal disappears as data pile up. 


BY ELIZABETH GIBNEY, CHICAGO, ILLINOIS 


new era in particle physics. But the latest 

data have squashed hopes that hints of 
an unexpected particle detected by the Large 
Hadron Collider (LHC) would solidify with 
time. Instead, the intriguing data ‘bump first 
reported in December turns out to be just a 
statistical fluctuation. 

Representatives from ATLAS and CMS — 
two independent experiments at the LHC, part 
of the European particle-physics laboratory, 
CERN — presented the news at the Interna- 
tional Conference on High Energy Physics 
(ICHEP) in Chicago, Illinois, on 5 August. The 
analyses included nearly five times the amount 
of data used in December, and show that the 
signal has faded to almost nothing. 

“There is no significant excess seen in the 


lE would have marked the beginning of a 


2016 data,’ said Bruno Lenzi, an ATLAS physi- 
cist based at CERN near Geneva, Switzerland, 
to a standing-room-only session at ICHEP. 

Additional data from CMS also failed to pro- 
duce a significant signal, says Chiara Rovelli, 
a physicist at the National Institute of Nuclear 
Physics in Rome. 

The announcement was a disappointment 
to researchers, but it wasn’t unexpected. The 
ATLAS team’s previous update, in June, put 
the signal's significance — a measure of the 
chances that random fluctuations in the data 
would produce such a bump without a par- 
ticle — at 2.1 sigma. That was well below the 
5-sigma threshold for determining whether a 
signal is a discovery or just noise. 

But both ATLAS and CMS independently 
saw the signal, comprised of slightly more 
pairs of photons — with a combined energy of 
750 gigaelectronvolts — than expected. That 


gave physicists hope that the bump was real. 
Researchers around the world produced more 
than 500 papers trying to explain the potential 
particle. 

“Seeing a glimpse of something, even 
the half a glimpse that makes you hold your 
breath a moment and think, ‘what if’ — it’s 
too valuable to be left unexplored,’ says Tara 
Shears, a particle physicist at the University of 
Liverpool, UK. 


HISTORY OF A BUMP 

The cautious excitement was driven by the 
bump’s potential pay-off, says Don Lincoln, 
a physicist at the Fermi National Accelerator 
Laboratory near Batavia, Illinois. The stand- 
ard model is incomplete because it fails to 
account for mysteries such as dark matter, 
and can‘ reconcile quantum mechanics with 
gravity. A new particle would have directed 
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> physicists towards an alternative theory, 
says Lincoln. 

The signal was appealing in part because 
the analysis behind it was relatively simple 
and robust, says Christoffer Petersson, 
a theoretical physicist at Chalmers 
University of Technology in Gothenburg, 
Sweden. 

The fact that the particle could have been 
a heavier cousin to the Higgs boson was 
also enticing, says Guido Tonelli, a physicist 
at the University of Pisa in Italy and former 
head of CMS. 

Even though all those models are now 
wrong, it was a fun and useful exercise to 
try to explain the bump, says Petersson. 

Statistical fluctuations and discoveries 
look identical at first, says Lincoln. Such 
coincidences are always possible when 
performing thousands of searches across a 
wide range of particle masses. It has hap- 
pened before and will probably happen 
again, he says. 


ONWARD 

This false alarm does not affect the LHC’s 
chances of finding something else, says 
Petersson. For now, it is business as usual 
for the collider’s experiments. 

Still, there is some concern that 40 years 
after the development of the standard 
model, particle accelerators, including the 
LHC, have not found anything beyond it. 

It’s surprising that nothing unexpected 
has emerged from the LHC data, says Guy 
Wilkinson, a physicist at the University of 
Oxford, UK. This underscores a growing 
unease in the community: as time goes 
on without new findings, it becomes less 
likely that the most appealing versions 
of supersymmetry — arguably the most 
promising way to extend the standard 
model — are true. 

But Petersson notes that the chances that 
the LHC will find something beyond the 
standard model will go up this year and 
next, because the collider is operating near 
its maximum energy of 14 teraelectronvolts. 
If new particles are rare, or if they decay in 
ways that are hard to observe, they could 
take a while to emerge, he says. 

And there are other ways of finding new 
particles, says Shears. With enough data, 
particles that are too heavy to be produced 
directly could reveal themselves through 
subtle influences on well-known particles. 
Physicists with LHCb, another experiment 
at the collider, have already found such 
hints, but they need more information to 
confirm them. 

“We know already that sooner or later, 
one of these anomalies will survive all con- 
trols and suddenly — crack — everything 
will change,” Tonelli says. “The beauty of 
our work is that this could happen at any 
time.” m SEE EDITORIAL P.125 
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Perlan 2 aims to break the glider altitude record of 15,445 metres. 


ATMOSPHERIC RESEARCH 


Surfing glider set 
to study climate 


Perlan mission will ride stratospheric waves and conduct 


atmospheric research. 


BY DECLAN BUTLER 


glider that aims to soar higher than any 
A™ piloted aircraft will begin its first 
campaign this month in the skies above 
Argentina. For its pilots and engineers, the Per- 
lan Project holds the excitement of breaking the 
world altitude record for gliding — and perhaps 
one day reaching close to the vacuum of space. 
But for Elizabeth Austin, the project’s chief 
scientist, there's another thrill: the glider will 
carry scientific instruments for climate, aero- 
space and stratospheric research that cannot be 
done using other means. “The possibilities are 
just so incredible,’ says Austin, an atmospheric 
physicist and the founder of forecasting service 
WeatherExtreme in Incline Village, Nevada. 
The carbon-fibre glider, built with a pres- 
surized cabin, is intended to achieve sustained 
flight at around 27,000 metres, where the den- 
sity of air is about 2% of that at sea level. In the 
series of flights that the craft will begin in mid- 
August, it will fly to only 15,000-18,000 metres 
— in part because of weather conditions — but 
this could still break the glider altitude record 
of 15,445 metres, set by an earlier Perlan model. 
The glider will carry instruments to measure 
levels of aerosols and greenhouse gases, includ- 
ing ozone, methane and water vapour, and will 
gather information on the exchange of gases 
and energy between the two lower layers of 
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Earth’s atmosphere: the troposphere and the 
stratosphere. Those data, to be collected this 
year and next, could improve climate models, 
which account poorly for these atmospheric 
interactions and contain “horrific” uncertain- 
ties about the levels and behaviour of water 
vapour at stratospheric altitudes, Austin says. 

Scientific balloons have already flown at 
much higher altitudes, but they must follow 
the wind, Austin adds, whereas a pilot can steer 
and circle a glider. “We can spend hours fly- 
ing where we want. A glider is an incredible 
scientific platform as there’s no other way to 
get this sort of data.” 

“It’s an extremely exciting project,” says Jie 
Gong, an expert in atmospheric dynamics at 
NASAs sciences and exploration directorate 
in Greenbelt, Maryland. On the basis of its 
intended flight route, the Perlan glider might 
be able to provide the first direct observations 
of polar stratospheric clouds, a unique type of 
ice cloud that forms in the polar stratosphere 
and helps to deplete ozone, Gong adds. 

The glider is named after those same clouds, 
which have an iridescent mother-of-pearl 
appearance (Perlan means ‘pear!’ in Icelan- 
dic). They are typically generated at high 
altitudes by stratospheric mountain waves — 
when strong winds that blow over the tops 
of high mountains are driven up towards 
space. In 1992, a retired NASA test pilot, 
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Einar Enevoldson, founded the Perlan Project 
with the aim of creating a glider that could surf 
these waves up to the stratosphere. And in 2006, 
he and the US adventurer Steve Fossett proved 
the concept with their record-breaking flight 
on Perlan 1, a modified conventional glider. 

But Fossett’s death the following year ina 
light-aircraft accident set the project back until 
July 2014, when European aerospace group 
Airbus became a major sponsor and contrib- 
uted its research expertise. The Perlan 2 craft 
made its maiden flight last year in Oregon, and 
in March surfed its first mountain waves above 
the Sierra Nevada range in California. 

Its next flights will be over El Calafate on 
the eastern and southern fringes of the Andes 
range in Argentina. There, during the South 
Pole’s winter, a fast-moving, high-altitude jet 
stream called the polar-night jet extends from 
the troposphere into the upper atmospheric 
layers — helping the Andes mountain waves 
(and the glider) to reach the stratosphere (see 
‘Science on a glider’). 

Besides its atmospheric chemistry, Perlan 2 
will carry instruments to study turbulence in 
stratospheric mountain waves, and to explore 
the microphysics of interactions between 
mountain waves and polar meteorology, 
which ultimately affect weather variability. 
Information on how mountain waves break 
in the stratosphere is “extremely limited”, says 
Gong, and requires detailed, fine-scale data on 
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SCIENCE ON A GLIDER 


The Perlan glider aims to fly higher than any 
other piloted aircraft, and to conduct unique 
stratospheric research. 
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temperature, humidity and wind, which the 
glider is uniquely placed to measure. Airbus 
says that many of the weather phenomena 
Perlan 2 will encounter will provide useful 
information for it and other aircraft makers 
that are contemplating operating aeroplanes 
at higher altitudes. 

Once Perlan is fully tested, says Austin, she 
hopes to get funding to use the glider as a long- 
term scientific platform that would examine 
how hourly, seasonal or even decadal changes 
in the stratosphere affect weather and climate. 

A drone that could carry more instru- 
ments is a future possibility — but for now, 
a piloted craft is preferable and simpler, says 
Ed Warnock, the project’s chief executive. 
Machines cannot yet match the best human 
pilots when it comes to climbing waves in such 
demanding flight conditions, he says. 

Perlan’s backers hope that it can surpass 
27,000 metres in 2017 — and, ultimately, they 
intend another version of the glider to fly 
higher than 30,000 metres, where the air den- 
sity is almost identical to that on Mars’s surface. 
That might provide insight into how winged 
aircraft could fly on the red planet. 

For now, engineers and scientists alike are 
just hoping to see the glider soar into the 
stratosphere above the Andes and take data. 
“Everything in the aircraft is experimental. It’s 
avery difficult mission to do right, and to do it 
safely is not easy,’ Austin says. m 
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US to lift ban on funding for 
human-animal hybrids 


Researchers in the United States will soon be able to resume chimaera- based projects. 


BY SARA REARDON 


been banned from receiving funding 

from the US National Institutes of Health 
(NIH) for adding human stem cells to animal 
embryos, creating blends called chimaeras. But 
an NIH proposal released on 4 August lifts that 
moratorium, with certain exceptions. It also 
sets up a panel to review the ethics and over- 
sight of grant applications. 

The proposal shortens the window during 
which human cells can be introduced into 
non-human primate embryos, disallowing it 
before the central nervous system begins to 
form. This limits the number of human cells 
incorporated into a chimaera’s brain. It also 
prohibits breeding animals containing human 
cells, preventing growth of a chimaeric embryo 
ina non-human womb or the birth of an ani- 
mal more humanized than its parents. Grant 


C= September 2015, researchers have 


applications that fall into a grey area would 
undergo a panel review. 

The panel will pay particular attention to 
projects involving primates, mammals at very 
early developmental stages or those in which 
human cells could affect an animal's brain. Past 
acertain point, rodent embryos with human 
cells that could affect brain development are 
exempt from panel review, because there is 
little chance they would become human-like, 
says Carrie Wolinetz, NIH’s associate director 
for science policy in Washington DC. 

Currently, researchers use chimaeras to 
study early embryonic development and 
human diseases. But a major goal is to engineer 
animals to grow human organs that could then 
be transplanted into patients. 

Unlike in the United States, it is illegal to 
perform such research without approval in the 
United Kingdom, even with private funding. 

Steven Goldman, a neuroscientist at the 


University of Rochester in New York, says that 
the 2015 ban was overkill and is relieved that 
it will be lifted. 

But Ali Brivanlou, a developmental biolo- 
gist at Rockefeller University in New York City, 
says that the new rules should focus on limit- 
ing the percentage of the animal that becomes 
human instead of restricting the timing of 
modifications. 

Bioethicist Francoise Baylis, at Dalhousie 
University in Halifax, Canada, worries that 
there are no clear guidelines on how chimae- 
ras should be treated when used as research 
subjects. 

These are the kinds of questions that the 
oversight panel will discuss when reviewing 
grant applications, says Wolinetz. The NIH 
proposal is open for public comment for 
30 days, after which the agency will issue a final 
rule. Wolinetz hopes that it will be ready for the 
January 2017 grant cycle. = 
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GENE EDITING 


CRISPR alternattve doubted 


Reports of irreproducibility multiply, but author stands by his Ng Ago gene-editing system. 


BY DAVID CYRANOSKI, SHIJIAZHUANG, CHINA 


( ontroversy is escalating over whether 
a gene-editing technique proposed 
as an alternative to the popular 

CRISPR-Cas9 system actually works. 

Three months ago, Han Chunyu, a biologist 
at Hebei University of Science and Technology 
in Shijiazhuang, reported that the enzyme 
NgAgo can be used to edit mammalian genes. 
But scientists are increasingly complaining that 
they cannot replicate the results — although 
one researcher has told Nature that he can. 
Nature Biotechnology, which published the 
research, is investigating the matter. 

Han says he receives dozens of harassing calls 
and texts each day, mocking him and telling him 
that his career is over — but he is convinced that 
the technique is sound. On 8 August, he submit- 
ted a protocol to the online genetic-information 
repository Addgene. He hopes that this will help 
efforts to reproduce his work, but other scien- 
tists say it does not clear things up. 

The stakes are high. Over the past few years, 
the CRISPR-Cas9 system has transformed biol- 
ogy. But it has also made scientists hungry to 
expand the gene-editing toolkit (see ‘A guide to 
the many other ways to edit a genome’). NgAgo 
is one of several methods that have emerged. “A 
lot of us are really cheerleading and hoping that 


a 


it works,’ says geneticist George Church of Har- 
vard Medical School in Boston, Massachusetts. 

CRISPR-Cas9 uses small genetic sequences 
to guide an enzyme to cut DNA ina particular 
location. In the Nature Biotechnology paper, 
Han’s team reports using a wide variety of 
genetic sequences to guide NgAgo — which 


BEYOND CRISPR 


Han Chunyu maintains that the NgAgo enzyme can edit genes. 


a 


belongs to the Argonaute (Ago) family of 
proteins that others had flagged as potential 
gene editors — to edit eight different genes 
in human cells and to insert genes at specific 
points on chromosomes (EF. Gao et al. Nature 
Biotechnol. 34, 768-773; 2016). 

NgAgo cuts only the target genes, says Han, 


A guide to the many other ways to edit a genome 


The CRISPR-Cas9 tool enables scientists to 
alter genomes practically at will. It has blazed 
through labs around the world, finding new 
applications in medicine and basic research. 

But the zeal with which researchers jumped 
ona possible new system called NgAgo 
earlier this year reveals an undercurrent of 
frustration with CRISPR-Cas9 — and a drive 
to find alternatives. Some are variations on 
the CRISPR theme; others offer new ways to 
edit genomes (see go.nature.com/2bbgxwb 
for more). 


AMINI-ME 

CRISPR-Cas9 may one day be used to rewrite 
the genes responsible for genetic diseases. But 
the components of the system — an enzyme 
called Cas9 and a strand of RNA that directs 
the enzyme to the desired sequence — are too 
large to stuff into the genome of the virus most 


commonly used in gene therapy to shuttle 
foreign genetic material into human cells. 
Asolution comes in the form of a mini- 
Cas9, which was plucked from the bacterium 
Staphylococcus aureus. It’s small enough to 
squeeze into the virus used in one of the gene 
therapies currently on the market. Two groups 
have now used the mini-Cas9 in mice to correct 
the gene responsible for Duchenne muscular 
dystrophy. 


EXPANDED REACH 

Cas9 will not cut everywhere it’s directed to —a 
certain DNA sequence must be nearby for that 
to happen. This demand is easily met in many 
genomes, but can be a painful limitation for 
some experiments. Researchers are looking to 
microbes to supply enzymes that have different 
sequence requirements to expand the number 
of sequences they can modify. 
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One such enzyme, called Cpf1, may become 
an attractive alternative. Smaller than Cas9, it 
has different sequence requirements and is 
highly specific. 

Another enzyme, called C2c2, targets 
RNA rather than DNA —a feature that holds 
potential for studying RNA and combating 
viruses with RNA genomes. 


TRUE EDITORS 
Many labs use CRISPR-Cas9 only to delete 
sections in genes, thereby abolishing their 
function. “People want to declare victory 
like that’s editing,’ says George Church, a 
geneticist at Harvard Medical School in Boston, 
Massachusetts. “But burning a page of the book 
is not editing the book.” 

Those who want to swap one sequence with 
another face a more difficult task. When Cas9 
cuts DNA, the cell often makes mistakes as it 
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whereas CRISPR-Cas9 sometimes edits the 
wrong genes. And CRISPR-Cas9 requires a 
certain genetic sequence to be near the cutting 
site to initiate its activity, but NgAgo does not, 
which could broaden its applications, adds Han. 

The initial reaction to the work in China was 
laudatory, including a visit to the lab by China 
Central Television. It was overwhelming, says 
Han. He doesn't like to travel and has never left 
China: a trip to visit a collaborator in Hangzhou 
in March was the first time the 42-year-old had 
boarded a plane. Before his paper came out, “I 
was completely unknown’, says Han, who spoke 
to Nature at his laboratory and at a restaurant. 

Doubts about the research first surfaced at 
the beginning of July, when Fang Shimin, a 
former biochemist who has become famous 
for exposing fraudulent scientists, wrote on 
his website New Threads (xys.org) that he had 
heard reports of failed reproduction efforts, 
and alleged that Han’s paper was irreproduc- 
ible. Criticism grew on various Chinese sites. 

On 29 July, Gaetan Burgio, a geneticist at the 
Australian National University in Canberra, 
posted thorough details of his failed attempts 
to replicate the experiment on his blog. Nor- 
mally, his posts get a few dozen hits, but this 
one spiked to more than 5,000. 

On the same day, geneticist Lluis Montoliu at 
the Spanish National Centre for Biotechnology 
in Madrid e-mailed his colleagues at the Inter- 
national Society for Transgenic Technologies to 
recommend “abandoning any project involving 
the use of NgAgo”. The e-mail was leaked and 
posted on Fang's website. 

An online survey by molecular biologist 
Pooran Dewari of the MRC Centre of Regenera- 
tive Medicine in Edinburgh, UK, has found only 
9 researchers who say that NgAgo works — and 


An argonaute protein is one of many alternatives 
to the CRISPR-Cas9 gene-editing system. 


stitches together the broken ends. This creates 
the deletions that many researchers desire. 

But researchers who want to rewrite a DNA 
sequence rely on a different repair pathway 
that can insert a new sequence — a process 
that occurs at a much lower frequency than the 
error-prone stitching. That low efficiency poses 
a problem in many organisms, including some 


97 who say that it doesn't. And two researchers 
who initially reported success with NgAgo in an 
online chat group now say they were mistaken. 
Debojyoti Chakraborty, a molecular biologist 
at the CSIR-Institute of Genomics and Integra- 
tive Biology in New Delhi, says that he repeated 
a section of Han’s paper that described using 
NgAgo to knock out a gene for a fluorescent 
protein. The glow 
was reduced in his 
cells, so Chakraborty 
assumed that NgAgo 
had disabled the gene. 
But DNA sequencing 
revealed no evidence 
of gene editing. Jan 
Winter, a PhD stu- 
dent in genomics at 
the German Cancer 


Research Center in “It’s not worth 

Heidelberg, describes Purses: It 

asimilar experience. Wont surpass 
Han has only got CRISPR, not bya 


the system to work on 
cells cultured in his 


long shot.” 
Gaetan Burgio 


laboratory. It failed in 

cells that he purchased, which he later found 
to be contaminated with Mycoplasma bacteria. 
Others might be having the same problem, he 
says, and some graduate students might not be 
being careful with reagents. Winter disagrees: 
“T do not think it is a problem of the scientists 
doing something wrong” 

One researcher in China who doesn’t want 
his name to be entangled in the controversy 
told Nature that he had tested NgAgo in a 
few kinds of cell and found that it was able to 
induce genetic mutations at the desired sites 
—a finding that he verified by sequencing. He 


plants. “Everyone says the future is editing 
many genes ata time, and | think: ‘We can’t 
even do one now with reasonable efficiency’,” 
says plant scientist Daniel Voytas at the 
University of Minnesota in St Paul. 

But developments in the past few months 
have given Voytas hope. Two groups of 
researchers have come up with techniques that 
disable Cas9 then tether it to an enzyme that 
converts one DNA letter to another. Voytas and 
others are hopeful that tethering other enzymes 
to the disabled Cas9 will allow different 
sequence changes. 


PURSUING ARGONAUTES 

When researchers claimed in May that they 
could use a protein from the Argonaute family 
called NgAgo to slice DNA at a predetermined 
site without needing a guide RNA or a specific 
neighbouring genome sequence (F. Gao et al. 
Nature Biotechnol. 34, 768-773; 2016), they 
kicked off a wave of excitement. But laboratories 
have so far failed to reproduce the results. 

Even so, there is still hope that other Argonaute 
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adds that the process was less efficient than 
CRISPR-Cas9, “but, in short, it worked”. 

Two more Chinese scientists, who also asked 
not to be named, say they have initial results 
showing that NgAgo works but they still need 
to confirm with sequencing. 

“It might, might work,’ says Burgio, “but if so, 
it’s so challenging that it’s not worth pursuing. 
It won't surpass CRISPR, not bya long shot.’ 

He says there is little that is new in the revised 
protocol on Addgene. There is a warning to 
maintain levels of magnesium in cells, “but 
that doesn’t make any sense to me’, he says. It 
also warns against Mycoplasma contamination. 
But Montoliu, who might now give NgAgo one 
more chance in September, doubts that this 
could account for all the reported problems. 

The failure of NgAgo “would be disappoint- 
ing’, says microbiologist John van der Oost of 
Wageningen University in the Netherlands, a 
co-author of the 2014 analysis of Argonaute 
proteins that laid the groundwork for their use 
in gene editing (D. C. Swarts et al. Nature 507, 
258-261; 2014). “But then there is work for us 
left to do to see whether other Argonaute sys- 
tems can get it to work somehow.’ 

Last week, Nature Biotechnology sent a 
statement to Nature’s news team, which is 
editorially independent, saying that “several 
researchers” have contacted the journal to 
report that they cannot reproduce the results, 
and that “the journal is following established 
process to investigate the issues”. 

Hebei University says that it will ask Han to 
repeat the experiment so that it can be veri- 
fied by an independent party within a month, 
according to Chinese state media. m 


Additional reporting by Heidi Ledford. 


proteins could provide a way forward, says 
genome engineer Jin-Soo Kim at the Institute 
for Basic Science in Seoul. 


PROGRAMMING ENZYMES 
Other gene-editing systems are also in the 
pipeline, although some have lingered there 
for years. For an extensive bacterial project, 
Church’s lab did not reach for CRISPR at all. 
Instead, the team relied heavily on a system 
called lambda Red, which can be programmed 
to alter DNA sequences without the need for 
a guide RNA. But despite being studied for 
13 years in Church’s lab, lambda Red works 
only in bacteria. 

Church and Feng Zhang, a bioengineer 
at the Broad Institute of MIT and Harvard in 
Cambridge, Massachusetts, say that their labs 
are also working on developing enzymes called 
integrases and recombinases for use as gene 
editors. “By exploring the diversity of enzymes, 
we can make the genome-editing toolbox 
even more powerful,” says Zhang. “We have to 
continue to explore the unknown.” Heidi Ledford 
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ARCHAEOLOGY 


Coastal route for first Americans 


Life in Canadian corridor was too late to sustain migrations of Clovis and pre- Clovis people. 


BY EWEN CALLAWAY 


rchaeologists need a new theory for 
A& colonization of the Americas. Plant 
and animal DNA buried under two 
Canadian lakes squashes the idea that the first 
Americans travelled through an ice-free cor- 
ridor that extended from Alaska to Montana. 
The analysis, published online in Nature on 
10 August and led by palaeogeneticist Eske 
Willerslev of the University of Copenhagen, 
suggests that the passageway became habitable 
12,600 years ago (M. W. Pedersen et al. Nature 
http://dx.doi.org/10.1038/nature19085; 2016). 
That’s nearly 1,000 years after the formation 
of the Clovis culture — once thought to be the 
first Americans — and even longer after other, 
pre-Clovis cultures settled the continents. 
Some 14,000 years ago, glaciers in central 
Canada receded, before the appearance of Clo- 
vis people across what is now the central United 
States. “That coincidence seemed too powerful 


to ignore,” says archaeologist and co-author 
David Meltzer of Southern Methodist Univer- 
sity in Dallas, Texas. 

The ice-free-corridor theory began to crack 
in the 1990s, when researchers made a case that 
humans lived at Monte Verde in Chile more 

than 14,000 years 


“The ice-free ago. The discovery 


corridor has of other possible pre- 
been shown to Clovis sites in North 
be dead in the America further 
water.” shook the theory that 


Clovis people were 
the first Americans. But the idea that their 
ancestors at least trekked through the corridor 
persisted, says Meltzer, even though there was 
little consensus on when the passage opened or 
when it became habitable. “It’s 1,500 kilometres. 
You cant packa lunch and do it ina day.” 

To build a picture of the habitat as it crept 
out of the Ice Age, Willerslev’s team analysed 
DNA in cores taken from beneath two lakes 


in what was the last stretch of the corridor to 
melt. The first plant life — thin grasses and 
sedges — dates back just 12,600 years. The 
region later became lusher, with sagebrush, 
buttercups and even roses, followed by willow 
and poplar trees. This habitat attracted bison 
first, and later mammoths, elk, voles and the 
occasional bald eagle. Around 11,500 years 
ago, the corridor began to resemble the pine 
and spruce boreal forests of today’s landscape. 

The region’s bounty must eventually have 
tempted hunter-gatherers. But the dates rule 
out its use as a corridor by Clovis people and 
earlier groups to colonize the Americas, says 
Willerslev. Instead, both probably skirted the 
Pacific coast, perhaps by boat. 

Loren Davis, an archaeologist at Oregon 
State University in Corvallis, agrees: “Now 
that the ice-free corridor has been shown to 
be dead in the water — no pun intended — 
we can start to look at something like a coastal 
migration route. = 
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THE BANDWIDTH BOTTLENECK 


Researchers are scrambling to repair and expand data pipes worldwide 
— and to keep the information revolution from grinding to a halt. 


BY JEFF HECHT 


n 19 June, several hundred thousand US fans of the 
() television drama Game of Thrones went online to 
watch an eagerly awaited episode — and triggered 
a partial failure in the channel's streaming service. Some 
15,000 customers were left to rage at blank screens for more 
than an hour. 
The channel, HBO, apologized and promised to avoid 
a repeat. But the incident was just one particularly public 
example of an increasingly urgent problem: with global 
Internet traffic growing by an estimated 22% per year, the 
demand for bandwidth is fast outstripping providers’ best 
efforts to supply it. 


Although huge progress has been made since the 1990s, 
when early web users had to use dial-up modems and 
endure ‘the world wide wait; the Internet is still a global 
patchwork built on top of a century-old telephone system. 
The copper lines that originally formed the system's core 
have been replaced by fibre-optic cables carrying trillions 
of bits per second between massive data centres. But service 
levels are much lower on local links, and at the user end it 
can seem like the electronic equivalent of driving on dirt 
roads. 

The resulting digital traffic jams threaten to throt- 
tle the information-technology revolution. Consumers 


ILLUSTRATION BY RICHARD WILKINSON 
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can already feel those constraints when mobile-phone calls become 
garbled at busy times, data connections slow to a crawl in crowded 
convention centres and video streams stall during peak viewing hours. 
Internet companies are painfully aware that today’s network is far from 
ready for the much-promised future of mobile high-definition video, 
autonomous vehicles, remote surgery, telepresence and interactive 3D 
virtual-reality gaming. 

That is why they are spending billions of dollars to clear the traf- 
fic jams and rebuild the Internet on the fly — an effort that is widely 
considered to be as crucial for the digital revolution as the expansion of 
computer power. Google has partnered with 5 Asian telecommunica- 
tion companies to lay an 11,600-kilometre, US$300-million fibre-optic 
cable between Oregon, Japan and Taiwan that started service in June. 
Microsoft and Facebook are laying another cable across the Atlantic, 
to start service next year. “Those companies are making that funda- 
mental investment to support their businesses,” says Erik Kreifeldt, a 
submarine-cable expert at telecommunications market-research firm 
TeleGeography in Washington DC. These firms can't afford bottlenecks. 

Laying new high-speed cable is just one improvement. Researchers 
and engineers are also trying several other fixes, from speeding up 
mobile networks to turbo-charging the servers that relay data around 
the world. 


THE FIFTH GENERATION 

For the time being, at least, one part of the expansion problem is 
comparatively easy to solve. Many areas in Europe and North America 
are already full of ‘dark fibre’: networks of optical fibres that were laid 
down by over-optimistic investors during the Internet bubble that 
finally burst in 2000, and never used. Today, providers can often meet 
rising demand simply by starting to use some of this dark fibre. 

But such hard-wired connections don't help with the host of mobile 
phones, fitness trackers, virtual-reality headsets and other gadgets now 
coming online. Data traffic from mobile devices is increasing by an 
estimated 53% per year — most of which will end up going through 
mobile-phone towers, or ‘base stations, whose 
coverage is already spotty, and whose band- 
width has to be shared by thousands of users. 

The quality is spotty, as well. First-genera- 
tion mobile-phone networks, introduced in 
the 1980s, used analogue signals and are long 
gone. But second-generation (2G) networks, 
which added digital services such as texting in 
the early 1990s, still account for 75% of mobile 
subscriptions in Africa and the Middle East, 
and are only now being phased out elsewhere. 
As of last year, the majority of mobile-phone 
users in Western Europe were on 3G networks, 
which were launched in the late 1990s to allow for more sophisticated 
digital services such as Internet access. 

The most advanced commercial networks are now on 4G, which was 
introduced in the late 2000s to provide smartphones with broadband 
speeds of up to 100 megabits per second, and is now spreading fast. But 
to meet demand expected by the 2020s, say industry experts, wireless 
providers will have to start deploying fifth-generation (5G) technol- 
ogy that is at least 100 times faster, with top speeds measured in tens of 
billions of bits per second. 

The 5G signals will also need to be shared much more widely than 
is currently feasible, says Rahim Tafazolli, head of the Institute for 
Communication Systems at the University of Surrey in Guildford, 
UK. “The target is how can we support a million devices per square 
kilometre,” he says — enough to accommodate a burgeoning ‘Internet 
of Things’ that will range from networked household appliances to 
energy-control and medical-monitoring systems, and autonomous 
vehicles (see ‘Bottleneck engineering’). 

The transition to 5G, like those to 3G and 4G before it, is being coor- 
dinated by an industry consortium that has retained the name Third 
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Generation Partnership Project (3GPP). Tafazolli is working with this 
consortium to test a technique known as multiple-input, multiple- 
output (MIMO) — basically, a way to make each radio frequency carry 
many streams of data at once without letting them mix into gibberish. 
The idea is to put multiple antennas on both transmitter and receiver, 
creating many ways for signals to leave one and arrive at the other. 
Sophisticated signal processing can distinguish between the various 
paths, and extract independent data streams from each. 

MIMO is already used in Wi-Fi and 4G networks. But the small size of 
smartphones currently limits them to no more than four antennas each, 
and the same number on base stations. So a key goal of 5G research is to 
squeeze more antennas onto both. 

Big wireless companies have demonstrated MIMO with very high 
antenna counts in the lab and at trade shows. At the Mobile World Con- 
gress in Barcelona, Spain, in February, equipment-maker Ericsson ran 
live indoor demonstrations ofa multiuser massive MIMO system, using 
a 512-element antenna to transmit 25 gigabits per second between a pair 
of terminals, one stationary and the other moving on rails. The system is 
one-quarter of the way to the 100-gigabit 5G target, and it transmits at 
15 gigahertz, part of the high-frequency band planned for 5G. Japanese 
wireless operator NT'T DoCoMo is working with Ericsson to test the 
equipment outdoors, and Korea Telecom is planning to demonstrate 
5G services when South Korea hosts the next Winter Olympics, in 2018. 

Another approach is to make the devices much more adaptive. Instead 
of operating ona single, hard-wired set of frequencies, a mobile device 
could use what is sometimes called cognitive radio: a device that uses 
software to switch its wireless links to whatever radio channel happens 
to be open at that moment. That would not only keep data automatically 
moving through the fastest channels, says Tafazolli, but also improve 
network resilience by finding ways to route around failure points. And, 
he says, it’s much easier to upgrade performance by replacing software 
than by replacing hardware. 

Meanwhile, a crucial policy challenge for the 5G transition is find- 
ing a radio spectrum that offers adequate bandwidth and coverage. 
International agreements have already allo- 
cated almost every accessible frequency to a 
specific use, such as television broadcasting, 
maritime navigation or even radio astronomy. 
So final changes will have to wait for the 2019 
World Radiocommunication Conference. But 
the US Federal Communications Commis- 
sion (FCC) is trying to get a head start by 
auctioning off frequencies below 1 gigahertz 
to telecommunications companies. Once 
reserved for broadcast television because they 
are better than higher frequencies at pene- 
trating walls and other obstructions — but no 
longer needed after television's shift to digital — these low frequen- 
cies are particularly attractive for serving sparsely populated areas, 
says Tafazolli: only a few base stations would be required to provide 
broadband service to households and driving data to autonomous 
cars on motorways. 

Other bands in the 1-6-gigahertz range could be opened up for 5G 
use as 2G and 3G technologies are phased out. But the best hope for 
dense urban areas is to exploit frequencies above 6 gigahertz, which are 
currently little-used because they have a very short range. That would 
require 5G base stations up to every 200 metres in dense urban areas, 
one-fifth the spacing typical of urban 4G networks. But the FCC con- 
siders the idea promising enough that on 14 July, it formally approved 
opening these frequencies for high-speed, fast-response services. 

Ofcom, the UK regulatory body, is considering 


> NATURE.COM similar steps. 

Tolisten to a podcast Companies are particularly interested in 
about thebandwidth these higher frequencies as a way to extend 5G 
challenge, visit: technology for other uses. In the United States, 
go.nature.com/2axbk00 wireless carrier Verizon and a consortium 
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of equipment-makers including Ericsson, Cisco, Intel, Nokia and 
Samsung have tested 28-gigahertz transmission at sites in New Jersey, 
Massachusetts and Texas. The system uses 5G technology to deliver 
data at 1 gigabit per second, and Verizon is adapting it for use in fixed 
wireless connections to homes, which it plans to test next year. The 
company has been pushing fixed wireless as an alternative to wired 
connections, because connection costs are much lower. 


BIGGER PIPES 

“When I take out my cell phone, everyone thinks of it as a wireless com- 
munications device,” says Neal Bergano, chief technology officer of TE 
SubCom, a submarine-cable manufacturer based in Eatontown, New 
Jersey. Yet that is only part of the story, he says: “Users are mobile, but the 
network isn't mobile?” When someone uses their phone, its radio signal 
is converted at the nearest base station to an optical signal that then has 
to travel to its destination through fixed fibre optics. 

These flexible glass data channels have been the backbone of the 
global telecommunications network for more than a quarter of a cen- 
tury. Nothing can match their bandwidth: today, a single hair-thin fibre 
can transmit 10 terabits (trillion bits) per second across the Atlantic. 
That is the equivalent of 25 double-layer Blu-ray Discs per second, and 
is 30,000 times the capacity of the first transatlantic fibre cable, laid in 
1988. Much of that increase came when engineers learned how to send 
100 separate signals through a single fibre, each at its own wavelength. 
But as traffic continues to increase over heavily used routes, such as New 
York to London, that approach is coming up against some hard limits: 
distortion and noise that inevitably build up as light passes along thou- 
sands of kilometres of glass have made it effectively impossible to send 
more than 100 gigabits per second on a single wavelength. 

To overcome that limit, manufacturers have developed a new type of 
fibre. Whereas standard fibres send the light through a 9-micrometre- 
wide core of ultrapure glass running down the middle, the newer design 
spreads the light over a larger core area at lower intensity, reducing noise. 
The trade-off is that the new fibres are more sensitive to bending and 
stretching, which can introduce errors. But they work very well in sub- 
marine cables, because the deep sea provides a benign, stable environ- 
ment that puts little strain on the fibre. 

Last year, networking-systems firm Infinera in Sunnyvale, Califor- 
nia, sent single-wavelength signals at 150 gigabits per second through 
a large-area fibre for 7,400 kilometres — more than 3 times the distance 
possible with a standard fibre, and easily enough to cross the Atlantic. 
They also transmitted 200-gigabit-per-second signals a shorter distance. 

The highest-capacity commercial submarine cable now in service is 
the 60-terabit-per-second FASTER system that opened in June between 
Oregon and Japan. It sends 100-gigabit-per-second signals on 100 wave- 
lengths in each of 6 pairs of large-core fibres. But in late May, Microsoft 
and Facebook jointly announced plans to beat it with MAREA: a large- 
area fibre cable spanning the 6,600 kilometres between Virginia and 
Spain. When completed in October 2017, the cable will link the two 
companies’ data centres on opposite sides of the Atlantic at 160 tera- 
bits per second. 

Another approach to reducing performance-limiting noise was dem- 
onstrated last year by a group at the University of California, San Diego. 
Fibre-optic systems normally use separate lasers for each wavelength, 
but tiny, random variations can generate noise. Instead, the group used a 
technique known as a frequency comb to generate a series of uniformly 
spaced wavelengths from a single laser (E. Temprana et al. Science 348, 
1445-1448; 2015). “It worked like a charm” to reduce noise, says group 
member Nikola Alic, an electrical engineer. With further development, 
he says, the approach could double the data rate of fibre-optic systems. 


TIME OF FLIGHT 

Impressive bandwidth is useful, but promptness also matters. Human 
speech is so sensitive to interruption that a delay of one-quarter of a 
second can disturb a phone or video conversation. Video requires a 


BOTTLENECK ENGINEERING 


The Internet was built on a century-old telephone system, leaving many 
choke points that have to be eliminated to keep the bits flowing. 


MOBILE EVERYTHING 


Demand for wireless connections is exploding, with ever more 
devices coming online. Engineers hope to meet that demand 
with fifth-generation (5G) networks that will increase data 
rates from millions to billions of bits per second. 
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Hard-wired networks 


CLOUD COMPUTING Sule 


Much of the world’s digital information is moving 
to the cloud: a global network of data centres 
that are linked together with high-capacity 
fibre-optic cables. Building more data centres 
and introducing higher-capacity cables promise 
to make the cloud more responsive. 


Data 
centre 


Long- 
distance 
The centres copy each . : 
other’s data to keep Tibresopuc 
cables 


information close to users. 


Cables stretching 
across the oceans link 
land-based networks 
into a global system. 


fixed frame rate, so streaming video stalls when its input queue runs 
dry. To overcome such problems, FCC rules allow special codes that 
give priority passage for packets of data carrying voice calls or video 
frames, so that they flow quickly and uniformly through the Internet. 

New and emerging services including telerobotics, remote surgery, 
cloud computing and interactive gaming are also sensitive to network 
responsiveness. The time it takes for a signal to make a round trip 
between two terminals, often called latency, depends largely on dis- 
tance — a reality that shapes the geography of the Internet. Even though 
data travel through fibre-optic cable at 200,000 kilometres per second, 
two-thirds the velocity of light in the open air, a person tapping a key in 
London would still need 86 milliseconds to get a response from a data 
centre in San Francisco, 8,600 kilometres away — a delay that would 
make cloud computing crawl. 

Emerging mobile applications require both broad bandwidth and low 
latency. Autonomous cars, for example, need real-time data on their 
environment to warn them about hazards, from potholes to accidents 
ahead. Conventional cars are becoming wireless nerve centres, needing 
low latency for ‘hands-free’ voice-control systems. 

A potentially huge challenge is the emergence of 3D virtual-reality 
systems. Interactive 3D gaming requires data to travel at 1 gigabit per 
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Opened in June 2016 by 

Google and five Asian firms, this | 
is currently the highest-capacity 
submarine cable in service. 


second — 20 times the speed of a typical video feed from a Blu-Ray 
Disc. But most crucially, the image must be rewritten at least 90 times 
per second to keep up with users turning their heads to watch the action, 
says computer scientist David Whittinghill of Purdue University in West 
Lafayette, Indiana. If the data stream slips behind, the user gets motion 
sickness. To keep that from happening, Whittinghill has installed a 
special 10-gigabit-per-second fibre line to his virtual-reality lab. 

To speed up responses, big Internet companies such as Google, 
Microsoft, Facebook and Amazon store replicas of their data in multiple 
server farms around the world, and route queries to the closest. Video 
cached at a local data centre is what allows viewers to fast-forward as 
if the file was stored on a home device, says Geoff Bennett, director of 
solutions and technology for Infinera. But the proliferation of these data 
centres is also one of the biggest drivers of bandwidth demand, he says: 
vendors’ efforts to synchronize private data centres around the world 
now consume more bandwidth than public 
Internet traffic. The Microsoft-Facebook 
cable is being built expressly for this purpose 
(see ‘The submarine web). 

So far, most data centres are where the 
customers and cables are: in North America, 
Europe and east Asia. “Many parts of the world 
still rely on remote access to content that is not stored locally,” says 
Kreifeldt. South America has few data centres, he says, so much of the 
content comes from well-wired Miami, Florida: traffic between Chile 
and Brazil might be routed through Miami to save money, but at a cost 
in latency. The same problem plagues the Middle East, where 85% of 
international traffic must travel to centres in Europe. That is changing, 
says Kreifeldt, but progress is slow. Amazon Web Services launched its 
first cloud data centre in India this year, in Mumbai; it has had a similar 
centre in Sao Paulo, Brazil, since 2011. 


INTERNAL COMMUNICATIONS 

Bandwidth is also crucial on the very smallest scale: on and between 
the chips in the banks of servers in a data centre. Expanding the flow 
here can help information to move more quickly within the data cen- 
tres and get out to users faster. Chip clock speeds — how fast the 
chip runs — flat-lined at a few gigahertz several years ago, because 
of heating problems. The most practical way to speed up processors 
significantly is to divide the operations that they perform between 
multiple ‘cores’: separate microprocessors operating in parallel on the 
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Much of the world’s Internet traffic passes under the oceans, through fibre-optic cables that can run along 
the sea bed for thousands of kilometres. Companies are constantly laying more and better cables. 
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same chip. That requires high-speed connections within the chip — 
and one way to make them is with light, which can move data faster 
than electrons can. 

The biggest obstacle has been integrating microscale optics with 
silicon electronics. After years of research on ‘silicon photonics, engi- 
neers have yet to find a way to efficiently generate light from silicon, 
a key step in optical information processing. The best semiconductor 
light sources, such as indium phosphide, can be bonded to silicon chips, 
but are very difficult to grow directly on silicon, because their atoms are 
spaced differently. Optical and electronic components have been inte- 
grated on indium phosphide, but so far only ona small scale. 

In an effort to scale up photonic integration to a commercial level, 
the United States last year launched the American Institute for Man- 
ufacturing Integrated Photonics in Rochester, New York, which is 
supported by $110 million from federal agencies and $502 million 
from industry and other sources. Its target 
is to develop an efficient technology to make 
integrated photonics for high-speed applica- 
tions, including optical communications and 
computing. 

Separately, a Canadian-funded team earlier 
this year demonstrated a photonic integrated 
circuit with 21 active components that could be programmed to per- 
form 3 different logic functions (W. Liu et al. Nature Photon. 10, 190- 
195; 2016). That’s an important step for photonic microprocessors, 
comparable in complexity to the first programmable electronic chips 
that opened the door to microcomputers. “Compared to current elec- 
tronics, it’s simple, but compared to photonic integrated circuits it is 
quite complicated,” says study co-author Jianping Yao, an electrical 
engineer at the University of Ottawa in Canada. 

Further development could lead to varied applications. For example, 
Yao says that after the chip is optimized for manufacture, it could con- 
vert a 5G smartphone signal received at a base station into an analogue 
optical signal, which could be transmitted by fibre optics to a central 
facility, and then digitized. 

The quest for faster chips, like other parts of the Internet problem, 
is a daunting challenge. But researchers such as Bergano see a lot of 
potential for improvements. After 35 years of working on fibre optics, 
he says, “I remain a complete optimist when I think about the future.” = 


Jeff Hecht is a freelance writer in Auburndale, Massachusetts. 
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The ravages of guns, 
nets and bulldozers 


The threats of old are still the dominant drivers of current species loss, 
indicates an analysis of IUCN Red List data by Sean Maxwell and colleagues. 


r | here is a growing tendency for media 
reports about threats to biodiversity 
to focus on climate change. 

Here we report an analysis of threat 
information gathered for more than 8,000 
species. These data revealed a contrasting 
picture. We found that by far the biggest 
drivers of biodiversity decline are overex- 
ploitation (the harvesting of species from 
the wild at rates that cannot be compen- 
sated for by reproduction or regrowth) and 
agriculture (the production of food, fodder, 


fibre and fuel crops; livestock farming; 
aquaculture; and the cultivation of trees). 
Early next month, representatives from 
government, industry and non-govern- 
mental organizations will define future 
directions for conservation at the World 
Conservation Congress of the Interna- 
tional Union for Conservation of Nature 
(IUCN). High on the agenda for politi- 
cal leaders, non-governmental organiza- 
tions, conservationists and many others 
will be taking steps to turn the 2015 Paris 


climate agreement into action. It is also 
crucial that the World Conservation Con- 
gress delegates — and society in general 
— ensure that efforts to address climate 
change do not overshadow more immedi- 
ate priorities for the survival of the world’s 
flora and fauna. 


ON THE LIST 

Since 2001, the categories and criteria of the 
IUCN Red List of Threatened Species — a 
standard for the evaluation of extinction 
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risk — have guided assessments, now for 
82,845 species. Assessors assign species to 
categories, including ‘near-threatened, ‘vul- 
nerable’ ‘endangered’ or ‘critically endan- 
gered’ depending on their population size; 
past, current and projected population 
trends; geographic range and other symp- 
toms of extinction risk. Species in the latter 
three groups are collectively referred to as 
‘threatened’. 

To assess the relative prevalence of cur- 
rent hazards to biodiversity, we quantified 
threat information for 8,688 near-threat- 
ened or threatened species belonging to 
species groups in which all known species 
have been assessed (for complete list of taxa 
included, see Supplementary Information; 
go.nature.com/2ajen88). 

The basic message emerging from these 
data is that whatever the threat category or 
species group, overexploitation and agri- 
culture have the greatest current impact on 
biodiversity (see ‘Big killers’). 

Of the species listed as threatened or 
144 
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Overexploitation and agriculture are the most prevalent threats facing 
the 8,688 threatened or near-threatened species from comprehensively 
assessed species groups on the IUCN Red List. 
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The Sumatran rhinoceros (Dicerorhinus sumatrensis) 
and Western gorilla (Gorilla gorilla) are being harmed 


by overexploitation; Africa’s cheetah (Acinonyx jubatus) 


and Asia’s hairy-nosed otter (Lutra sumatrana) are 
being imperilled by agricultural activity. 


The common hippopotamus 
(Hippopotamus amphibius) and 
leatherback turtle (Dermochelys 
coriacea) are being affected by 
droughts and high temperatures. 


near-threatened, 72% (6,241) are being 
overexploited for commerce, recreation or 
subsistence. 

The Sumatran rhinoceros (Dicerorhinus 
sumatrensis), Western gorilla (Gorilla 
gorilla) and Chinese pangolin (Manis 
pentadactyla, a scaly mammal), for 
instance, are all illegally hunted as a result 
of high market demand for their body 
parts and meat. These are just three of 
the more than 2,700 species affected by 
hunting or fishing, or by people collect- 
ing live specimens for the pet trade. At 
the same time, unsustainable logging is 
contributing to the decline of more than 
4,000 forest-dependent species, such as 
the Bornean wren-babbler (Ptilocichla 
leucogrammica), India’s Nicobar shrew 
(Crocidura nicobarica), and the Myanmar 
snub-nosed monkey (Rhinopithecus 
strykeri). 

The expansion and intensification 
of agricultural activity is imperilling 
5,407 species — 62% of those listed as 


2016 


threatened or near-threatened. Africa’s 
cheetah (Acinonyx jubatus), Asia’s 
hairy-nosed otter (Lutra sumatrana) 
and South America’s huemul deer (Hip- 
pocamelus bisulcus) are among more than 
2,300 species affected by livestock farming 
and aquaculture. And the Fresno kanga- 
roo rat (Dipodomys nitratoides) and the 
African wild dog (Lycaon pictus) are two 
of more than 4,600 species currently under 
threat from land modification associated 
with the production of food, fodder or fuel 
crops. 

Meanwhile, anthropogenic climate 
change — including increases in storms, 
flooding, extreme temperatures or 
drought that exceed background vari- 
ability, as well as sea-level rise — is cur- 
rently affecting 19% of species listed as 
threatened or near-threatened. Hooded 
seals (Cystophora cristata) are among the 
1,688 species affected. These have dropped 
in abundance by 90% in the northeastern 
Atlantic Arctic over the past few decades 
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The Spanish imperial eagle 
(Aquila adalberti) and giant panda 
(Ailuropoda melanoleuca) are 
being harmed by road building. 
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as a result of extensive declines in regional 
sea ice, and so in the availability of sites for 
resting and raising pups. 


DATA LIMITATIONS 
There are three obvious difficulties in 
interpreting the Red List data. 

First, the patterns we report here do not 
necessarily extend to taxonomic groups that 
haven't been monitored. The comprehen- 
sively assessed groups included here are not 
a random sample from the tree of life’, but 
those that are generally better-studied. All 
known bird species have been assessed, for 
instance. But information on extinction risk 
has been gathered for only some 0.1% of the 
more than 50,000 species of fungi thought 
to exist. 

A second potential limitation of our 
analysis is that it treats threats as discrete 
when, in fact, hazards rarely affect organisms 
in isolation. Agriculture is a major driver of 
greenhouse-gas emissions, for example. And 
new roads to enable agricultural expansion 


can increase bush-meat harvesting, the 
incidence of forest fires and habitat fragmen- 
tation’. In fact, more than 80% of the species 
included in our analysis are affected by more 
than one major threat. 

Finally, the balance of threats driving 
extinction risk for many of the world’s spe- 
cies will change, even over the next few 
decades’. For Red 
List assessments, the 
impacts of future 
threats (including 
climate change) in 
reducing a species’ 
population size are 
projected across three 
generations or over a 
ten-year period — 
whichever is longer. Hence, unless the 
species being assessed is long-lived (with 
an expected lifespan of 30-50 years, say), 
projections cover a period during which the 
effects of climate change, in particular, will 
be relatively modest. 

Yet we do not think that any of these 
caveats alter the overall message. Because 
agricultural activity and overexploitation 
tend to occur in fertile places with naturally 
high levels of biodiversity’, the patterns 
emerging from our analysis probably extend 
to many of the other species that have not 
yet been assessed. Also, until a better under- 
standing is obtained of how threats act addi- 
tively, synergistically or antagonistically, a 
pragmatic course of action is to limit those 
impacts that are currently harming the most 
species’. Finally, studies have shown Red List 
categorizations reflecting projected extinc- 
tion risk from climate change to be more 
robust than was previously thought®. 


WHAT NEXT? 

Of all the plant, amphibian, reptile, bird and 
mammal species that have gone extinct since 
AD 1500, 75% were harmed by overexploita- 
tion or agricultural activity or both (often 
in combination with the introduction of 
invasive alien species’). Climate change will 
become an increasingly dominant problem 
in the biodiversity crisis’. But human devel- 
opment and population growth mean that 
the impacts of overexploitation and agricul- 
tural expansion will also increase. 

The aim of the World Conservation Con- 
gress is to translate sustainable develop- 
ment and carbon neutrality agreements into 
action. We urge delegates to focus on pro- 
posing and funding actions that prioritize 
the biggest current threats to biodiversity. 

Thankfully, there are effective tools and 
approaches to alleviate harm caused by 
overexploitation and agricultural activi- 
ties®. These include the development and 
governance of sustainable harvest regimes; 
the enforcement of hunting regulations 
and no-take marine protected areas; 


the maintenance of international policy 
mechanisms; such as the Convention on 
International Trade in Endangered Species; 
and public education (for instance, on where 
ivory comes from) to reduce demand. Also 
powerful are the establishment of protected 
areas to safeguard key biodiversity areas’; 
the management of agricultural systems in 
ways that allow threatened species to per- 
sist within them; the regulation of pesticide 
and fertilizer use; the certification of agri- 
cultural sustainability; and the reduction of 
food waste, for example, using urban food- 
transfer programmes. 

Crucially, ensuring that overexploita- 
tion and agricultural activities today do not 
compromise ecosystems tomorrow will help 
to ameliorate the challenges presented by 
impending climate change. Healthy ecosys- 
tems are better repositories for carbon. They 
are also more likely to provide the physical 
connectivity and genetic diversity needed to 
enable species to adapt to the large shifts in 
climate expected later this century”. 

Conservationists, weary of tackling 
herculean, long-standing problems, could 
be forgiven for being drawn to newer ones. 
Nonetheless, we appeal to all concerned 
with the sustainability of life on Earth to take 
stock of the current balance of threats — and 
refocus their efforts on the enemies of old. = 
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| COMMENT | BOOKS & ARTS 


MICROBIOLOGY 


Mob rule 


Adrian Woolfson examines four books on the \ 
microbiological universe that churns within us. 


Sydney Brenner gave a talk in Cam- 

bridge, UK, in which he espoused the 
merits of sequencing the human genome 
to fully characterize the human “gene 
kit”. Several years later, in 2001, the first 
draft sequence of the human genome was 
released. The assumption was that human 
form, function and dysfunction would be 
reduced to a finite and tractable problem. 
Over time, this vision has been eroded by 
the discovery of successive Russian-doll- 
like levels of informational and regulatory 
complexity, from epigenetics to microRNAs. 
Genomic protein-encoding genes may rep- 
resent the surface of a much deeper problem. 

The latest assault 


E the early 1990s, molecular biologist 


on Brenner’s model “Our 

of organismal form minuscule 

and function has Passengers 
come fromanunex- ct like puppet 
pected quarter. It masters, 

seems that, instead of manipulating 
being self-contained, how we think, 


the contents of the 
human gene kit are 
generously supplemented by a plethora of 
extraneous components. These riches come 
from the topsy-turvy world of microorgan- 
isms, symbionts whose products bolt onto 
the more modest collection furnished by 
their hosts. The implications of this extra 
informational dimension, and how it inter- 
weaves with our genes, are explored in four 
new books. 

In his compelling I Contain Multitudes, 
science writer Ed Yong plunges into the 
Alice in Wonderland shadow world of 
the microbes that live in and on us. As he 
reminds us, the 30 trillion cells in the human 
body are effortlessly outnumbered by the 
39 trillion or so microbial cells that lurk 
within it. Our own genomes muster 20,000 
protein-encoding genes; our uninvited 
guests may collectively field an impressive 
10 million. We know this thanks to meta- 
genomics — the method of sequencing 
short, species-specific stretches of RNA, pio- 
neered by biophysicist Carl Woese in the late 
1960s — which helps to define the genomic 
architecture of our microbial communities. 

Bacteria confer unique properties on their 
hosts. Their collective genes, and capacity 
for rapid evolution through high rates of 


feeland act.” 


mutation, horizontal gene transfer and rapid 
replication, render them virtuosos of bio- 
chemistry, and providers of rich metabolic 
creativity. This gives organisms a versatility 
far above that afforded by their own genes. 
Aphids, for example, rely on Buchnera-strain 
bacterial symbionts to produce essential 
amino acids absent from the phloem sap that 
is the insects’ food. Such relationships led 
US biologist Ivan Wallin in 1927 to describe 
symbiosis as an engine of novelty that ena- 
bles bacteria to transform their host species. 
Whereas scientists from germ-theory 
pioneer Louis Pasteur to penicillin-devel- 
oper Howard Florey have taught us to fear 
microbes, Yong argues that we must nurture 
them, appreciating that they may help us to 
develop into what we are. The human micro- 
biome should be viewed as a distributed 
organ, performing functions as essential as 
those of our liver, lungs or kidneys. 
Intriguingly, Yong argues that human 
immune cells are akin more to park rang- 
ers than to xenophobes, carefully wrangling 
the microbial zoo, modulating its popula- 
tion dynamics and responding to its chatter. 
The degradation and collapse of coral reefs 
in warm, acidic waters is due not only to 
direct effects of global warming, but also to 
the disruption of relationships in microbial 
communities. Likewise, Yong suggests that 
some human diseases result from alterations 
to bacterial community dynamics, trig- 
gering abnormalities in internal microbial 


| Contain Multitudes: The Microbes Within 
Us and a Grander View of Life 

ED YONG 

Ecco: 2016. 


The Human Superorganism: How the 
Microbiome Is Revolutionizing the Pursuit of 
a Healthy Life 

RODNEY DIETERT 

Dutton: 2016. 


This Is Your Brain on Parasites: How Tiny 
Creatures Manipulate Our Behavior and 
Shape Society 

KATHLEEN MCAULIFFE 

Houghton Mifflin Harcourt: 2016. 


The Mind-Gut Connection: How the Hidden 
Conversation Within Our Bodies Impacts 
Our Mood, Our Choices, and Our Overall 
Health 

EMERAN MAYER 

Harper Wave: 2016. 
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Lactobacillus bacteria help 


to make human intestines 
hostile to pathogens. 


ecology and cooperativity. An example of this 
is obesity, which seems, in part, to result from 
an imbalance of gut microbes. Obese indi- 
viduals have more bacteria from the phylum 
Firmicutes and fewer from the genus Bacte- 
roides than lean ones, and a relative lack of 
Akkermansia muciniphila. It was shown in 
2013 that microbes from lean mice can make 
obese mice lose weight (A. Everard et al. Proc. 
Natl Acad. Sci. USA 110, 9066-9071; 2013). 

Yong goes on to explain how the dialogue 
between cells and resident microbes may 
affect organismal development. Hawaiian 
bobtail squid (Euprymna scolopes) adopt 
their mature form only in the presence of the 
luminescent bacterium Vibrio fischeri, which 
colonizes the squid’s light organ. Human 
breast milk contains indigestible oligosaccha- 
rides, the favoured food of Bifidobacterium 
longum infantis, which releases short-chain 
fatty acids that influence the permeability of 
an infant's gut cells. 

In The Human Superorganism, immuno- 
toxicologist Rodney Dietert goes further, 
asserting that Homo sapiens is a super- 
organism containing thousands of micro- 
bial species. He argues that the biology of 
microbes will eventually challenge our view 
of what it means to be human, and lead to 
the identification of therapeutic agents. In 
his vision, humans are “microbial storage 
machines” designed to pass microorganisms 
to future generations. Our “second genome” 
— the genes encoded by our microbiome — 
resides in a thriving bacterial community 
that he compares to the diversity of a tropi- 
cal rainforest. Even in the age of genome- 
editing tools such as CRISPR, it remains 
challenging to modify the human genome. 
Dietert is astute, however, in suggesting that 
microbial genomes could be engineered to 
introduce functionalities and tackle human 
diseases. The ability of microbial metabo- 
lites to manipulate the expression of human 
genes has already been established: sodium 
butyrate, for example, helps to control the 
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switch from embryonic to fetal haemoglobin. 

Not content to cruise around their 
luxury human condos, microorganisms 
also hack into our nervous systems. In her 
eye-opening, entertaining and slightly dis- 
concerting This Is Your Brain on Parasites, 
journalist Kathleen McAuliffe contends 
that our minuscule passengers act like pup- 
pet masters, manipulating how we think, 
feel and act. I will think of cats differently 
now that I am aware that their parasite 
Toxoplasma gondii may have the ability to 
affect human behaviour, and is implicated 
in mental illnesses such as schizophrenia. 
Men harbouring this parasite are, further- 
more, more inclined to break rules, and are 
more reserved and suspicious. McAuliffe 
ingeniously suggests that the psychoactive 
chemicals produced by microbes could be 
used to develop mind-altering medicines. 

Focusing on how the microbiome may 
cause chronic conditions such as persistent 
pain and irritable bowel syndrome, gastro- 
enterologist Emeran Mayer’s The Mind-Gut 
Connection depicts the brain, the gut and its 
microorganisms as a unitary structure tightly 
knit, anatomically and chemically. He asserts, 
albeit with rudimentary evidence, that the 
enteric nervous system — the mesh of neu- 
rons that governs the gastrointestinal system 
— functions as a mini-brain, relaying sensory 
information from the gut to the central nerv- 
ous system. It was fascinating to learn that 
microbes contain ancient versions of many 
signalling peptides and hormones found 
in the human alimentary tract, including 
noradrenaline, serotonin and endorphins. 
That may argue in favour of his thesis. Mayer 
speculates that early programming errors in 
the putative brain-gut-microbiome axis can 
result in medical conditions that might ben- 
efit from treatment with probiotics. 

We are descended from microbes, have 
evolved around them, and incorporate ele- 
ments of them into our cells. Microbiome 
profiling is certain to become as routine as 
blood testing, and the extensive treasure 
chest of bacterial molecules will doubtless be 
used to change the way we are. Our micro- 
bial companions may even influence our 
responses to important medicinal agents, 
such as the anti-PD-L1 and anti-CTLA4 
drugs that reinvigorate the immune systems 
of people with cancer. Several regional initia- 
tives, including the US Human Microbiome 
Project and National Microbiome Initiative, 
have been established to study the human 
microbiome. The complexities of cataloguing, 
mapping and characterizing microbial biol- 
ogy on a worldwide scale promise to make 
sequencing the human genome look easy. A 
global programme seems to beckon. = 


Adrian Woolfson is the author of Life 
Without Genes. 
e-mail: adrianwoolfson@yahoo.com 
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Books in brief 


Bird Brain: An Exploration of Avian Intelligence 

Nathan Emery lvy (2016) 

Cognitive biologist Nathan Emery has been on the cutting edge of 
research into avian intelligence since the 1990s. In this sparkling, 
superbly illustrated summation of the cognitive science, ethology and 
hot debates, Emery encapsulates the “feathered ape”. He compares 
the avian brain to the mammalian to reveal functional similarities in 
disparate anatomies (likened to fruitcake and layer cake, respectively) 
and tours spatial memory, migratory sense, tool use and more. From 
the wattle-bopping of black grouse (Jetrao tetrix) to the dung baiting of 
burrowing owls (Athene cunicularia), a masterful explication. 


Patient H.M.: A Story of Memory, Madness, and Family Secrets 
Luke Dittrich RANDOM HOUSE (2016) 

In 1953, experimental surgery left Henry Molaison with severe 
amnesia; he became ‘HM’, a star patient studied by neuroscientist 
Suzanne Corkin for almost 50 years (see D. Draaisma Nature 497, 
313-314; 2013). Luke Dittrich offers a very different perspective — 
he is the grandson of William Scoville, the lobotomist who operated 
on Molaison. Dittrich fleshes out the official account with nuanced 
biographies of the troubled Scoville and profoundly damaged 
Molaison, revelatory conversations with Corkin and accounts of 
behind-the-scenes scientific scuffles. Disturbing and illuminating. 


The Book 

Keith Houston W. W. NorTON (2016) 

The physical book has reigned as an agent of culture for 1,500 years. 
Keith Houston’s deft history of the object wraps entire civilizations 
into the telling, propelling us through the evolution of writing, 
printing, binding and illustration with gusto. The material innovations 
dazzle, from papyrus, vellum and paper (dating to second-century AD 
China) to the spattered path of inks. Equally gripping is the trajectory 
of production technologies, as the finical skill of scribes gives way 

to Johannes Gutenberg’s printing revolution and, ultimately, the 
streamlined wonders of modern lithography. 


Science and the City: The Mechanics Behind the Metropolis 
Laurie Winkless BLOOMSBURY SIGMA (2016) 

‘Up’, ‘Switch’, ‘Wet’: physicist Laurie Winkless’s chapter headings 
hint at a briskly bouncy ride ahead in this primer on the science 
embedded in cities. And so it proves, as she ponders wind-confusing 
skyscraper design, water-supply technologies such as “fog-sucking 
nets” and 3D-printed bridges. Perhaps most engrossing is her 
evocation of how modern subway systems are built — by delicately 
‘threading the needle’ through dense subterranean convolutions. 
The thickets of subheadings and bolded-up key terms may irk, but 
the witty Winkless has done her homework. 


Venomous 

Christie Wilcox FARRAR, STRAUS AND GIROUX (2016) 

Evolutionary biologist Christie Wilcox mines reams of research 

on venomous fauna, a vast cross-taxa group that ranges from the 
platypus (Ornithorhynchus anatinus), which delivers venom containing 
83 toxins, to the Komodo dragon (Varanus komodoensis), whose 
anticoagulant-laced version bleeds victims dry. We may cringe at 
snakebite necrosis, but Wilcox reminds us that venoms are “complex 
molecule libraries” with medical potential — so safeguarding their 
biodiversity also preserves biochemical riches. Barbara Kiser 
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Reforms set to seep 
into India’s schools 


A culture of rote learning in 
Indian schools could be partly 

to blame for the “copy and paste” 
mentality that undermines 

the country’s research (see 

A. Chaurasia Nature 534, 591; 
2016). Instead, children should be 
learning the importance of critical 
thinking, problem-solving and 
real-life application. 

Attempts to abolish rote 
learning so far extend only to 
private schools (see go.nature. 
com/2am4jdb). However, many 
more children stand to gain from 
the innovative non-government 
education initiative Ekal 
Vidyalaya, which uses a creative 
educational approach through 
a system of one-teacher schools 
in rural areas and tribal villages 
(www.ekal.org). 

Early results of public 
consultations by the government's 
Committee for Evolution of the 
New Education Policy and its 
Framework for Action promise 
other alternatives (see go.nature. 
com/2au3pej). And the 13 bold 
themes related to school education 
that have been identified as areas 
for improvement (see go.nature. 
com/2aurjby) should enable a 
new future. 

Sanchit Misra Banaras Hindu 
University, Varanasi, India. 
imsam93new@gmail. com 


Kudos for female 
Antarctic researchers 


Women scientists were prohibited 

from working in Antarctica until 

Soviet geologist Maria Klenova 

began her research there in 1956. 

Despite their contributions 

since, women comprise only 

11% of medal winners from 

the Scientific Committee on 

Antarctic Research. Our aim is 

to raise the profile of influential 

female researchers to inspire the 

roughly 60% of early-career polar 

scientists who are women. 
Notable contributions by 

women include the discovery 

of potential methane reservoirs 


beneath Antarctica (5 female 
authors out of 13: J. L. Wadham 
et al. Nature 488, 633-637; 
2012); the finding that snow 
melting accelerated in the 
twentieth century (4 of 9 authors: 
N. J. Abram et al. Nature Geosci. 
6, 404-411; 2013); and insights 
into life in the deep Southern 
Ocean (12 of 21: A. Brandt et al. 
Nature 447, 307-311; 2007). The 
directors of the two largest polar 
institutes, the British Antarctic 
Survey and the Alfred Wegener 
Institute in Germany, are women. 
To boost recognition of such 
achievements, we are writing 
referenced biographies for 
prominent female Antarctic 
scientists, and have received 
170 nominations from 
30 countries (see go.nature. 
com/2azwkjq). Examples include 
In-Young Ahn of the Korea Polar 
Research Institute, the first Asian 
woman to lead an Antarctic 
station, and Lois Jones, who 
in 1969 led the first all-female 
Antarctic research team. 
Jan Strugnell* La Trobe 
University, Melbourne, Australia. 
j.strugnell@latrobe.edu.au 
*On behalf of 7 correspondents (see 
go.nature.com/2akdzbd for full list). 


Funding: would 
Mendel have won it? 


The finding that interdisciplinary 
research has low funding success 
touches a sore spot in molecular 
biology (see L. Bromham et al. 
Nature 534, 684-687; 2016). The 
skilful integration of physics, 
mathematics and biology that led 
to the development of molecular 
biology is being superseded by the 
use of bioinformatics tools that 
can process and visualize large 
amounts of experimental data. 
Yet these tools often deliver only 
incremental advances in complex 
topics (for instance, in the function 
of transcriptional networks). 
Genuinely interdisciplinary 
landmark discoveries include 
the stochastic nature of gene 
expression and the realization 
that biological systems are 
‘noisy (M. B. Elowitz et al. 
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Science 297, 1183-1186; 2002), 
and the finding that there 

was interbreeding between 
Neanderthals and ancestors of 
modern humans (R. E. Green 

et al. Science 328, 710-722; 

2010). That discovery relied on 
sophisticated sample-preparation 
methods and advanced statistical 
analysis to reconstruct the flow of 
genetic material between ancient 
genomes. 

Historically, scientific 
curiosity has been driven by 
interdisciplinary knowledge. 
Gregor Mendel, for example, 
trained as a physicist. Modern 
teaching tends to gloss over the 
mathematical insights that his 
theory of inheritance required. 

I suspect that few biologists 
today could identify binomial 
distributions in pea-plant 
cross-breeding experiments 

and conclude that independent 
alleles are randomly segregated. 
Daniel Hebenstreit University of 
Warwick, Coventry, UK. 
d.hebenstreit@warwick.ac.uk 


Funding: spot value 
in grant proposals 


Interdisciplinary projects might 
have more funding success if 
some review-panel members 
had interdisciplinary research 
experience (see L. Bromham et al. 
Nature 534, 684-687 (2016) and 
go.nature.com/2at80wd). 

Such reviewers are more likely 
to grasp the importance of lines 
of investigation that fall outside 
disciplines. Our study on the 
feasibility of treating heroin users 
with pharmaceutical heroin, 
for example, called for research 
into whether this perceived 
permissiveness might influence 
illicit drug use and havea 
‘honeypot effect (G. Bammer 
Palgrave Commun. 2, 16017; 2016). 

Interdisciplinary reviewers 
also recognize that disciplinary 
research that is not cutting-edge 
can still warrant funding ifit 
sheds light on an interdisciplinary 
problem. Our insights into 
heroin-addiction treatment 
came from, among others, 


economists who determined the 
likely impact on the drug market; 
demographers who estimated 
the number of heroin users; and 
philosophers who assessed the 
ethics of prescribing heroin. 

The grant-review process could 
be improved if disciplinary and 
interdisciplinary panel members 
had a better understanding of 
how their views interact, and if 
guidelines could be drawn up for 
their relative contributions to the 
overall assessment. 

Gabriele Bammer Australian 
National University, Acton, 
Australia. 

gabriele. bammer@anu.edu.au 


Satellite company 
clarifies proposal 


As chief executive of the satellite- 
communications company Ligado 
Networks, I wish to emphasize 
that our proposed sharing ofa 
small block of radio frequencies 
with the US National Oceanic 
and Atmospheric Association 
(NOAA) will not jeopardize the 
delivery of weather information 
from satellites (see Nature 535, 
208-209; 2016). 

We are exploring this idea 
through open dialogue with the 
US Federal Communications 
Commission, NOAA and 
the weather community. The 
company was invited to discuss 
radio frequencies at an American 
Meteorological Society meeting 
last month, and feedback on our 
plan’s potential impact is allowing 
us to home in on remaining 
concerns and discuss solutions. 

We are also exploring an 
alternative network for providing 
real-time weather data to more 
users at a lower cost. This would 
protect NOAAs existing uses 
of the band and expand the 
availability of a high-demand 
wireless spectrum. Our 
technology could deliver reliable, 
secure connectivity to critical 
industries, including those that 
serve public safety. 

Doug Smith Ligado Networks, 
Reston, Virginia, USA. 
spectrum@ligado.com 
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Fast track for silver 


A solid composite material has been made that conducts electricity through the rapid transport of silver ions, which diffuse 
faster than in some liquids. The material holds promise for applications in charge-storage devices. SEE ARTICLE P.159 


TOM NILGES 


hat happens when two compounds 
that have contrary properties are 
mixed together? Do they work 


against each other, or do they combine con- 
structively to produce unexpected effects? 
On page 159, Chen et al.' report an impres- 
sive example of the second outcome. They 
have combined a material that conducts 
electricity purely through negatively charged 
electrons with one that conducts through the 
fast movement of positively charged ions, to 
create a composite that they call an artificial 
mixed conductor. The composite exhibits 
impressively fast ion diffusion, and has the 
potential to be of use in batteries. 

The electron conductor in the composite 
is graphite’, a carbon allotrope composed of 
layers made up of six-membered carbon rings. 
This layered structure means that conduction 
in graphite varies with the direction of the cur- 
rent. Graphite can conduct negatively charged 
electrons but can also host various highly 
mobile ions, and is a widely used electrode in 
energy-storage devices. Atomically thin layers 
of graphite are called graphene, and can act as 
a membrane that conducts protons’. 

The other component in the authors’ com- 
posite is rubidium silver iodide (RbAg,|I;), the 
best known solid conductor of silver ions at 
room temperature*. This compound can itself 
be thought of as a composite of silver iodide 
(AgI) and rubidium iodide (RbI). Silver ions 
are positively charged, and are large and heavy 
compared with most other charge carriers. But 
the rubidium ions in the iodide framework 
of RbAg,I; are perfectly arranged to provide 
vacant sites for the silver ions to ‘jump’ into. 
This allows the silver ions to move almost 
freely in all directions through the solid. 

If graphite and RbAg,I, are combined, then 
a solid, mixed conductor system might be 
generated that enables charge transport or 
conduction by two different charge carriers 
at the interfaces between the two compounds, 
potentially offering extremely high conductiv- 
ity. Conductors that naturally allow conduc- 
tion through both electron and ion transport 
have been widely studied and are used in pro- 
cesses that benefit from such optimized con- 
ductivity. For example, they have applications 
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Figure 1 | Interfacial mass transfer. Chen et al.' have prepared a composite of rubidium silver iodide 
(RbAg,I., a material that conducts using silver ions as charge carriers) and graphite (in which electrons 
carry charge). When the composite is connected to a silver electrode (not shown), the authors observe 
reversible rapid movement of silver ions that leads to either a reduction or an increase in the amount of 
silver in the RbAg,I., depending on the direction of the current. When the amount of silver decreases, 
electron holes (quasiparticles caused by the absence of electrons) in the graphite compensate for vacancies 
caused by the absence of silver ions in the RbAg,I., at the interfaces between particles of the two materials 
within the composite. When silver is added, the extra silver ions in the RbAg,I, are balanced by electrons 


in the graphite. 


in energy-storage devices such as batteries and 
supercapacitors, and in sensor devices” ’. 
Duality of charge transport has another 
advantage: it can be used to change the stoichio- 
metry (the ratio of atom types described by a 
chemical formula) of a conductor, because the 
addition or removal of a given charge carrier 
can be compensated for by adding or remov- 
ing the oppositely charged carriers. This 
enables fast transport, storage and redistribu- 
tion of mass, which is also useful for charge- 
storage devices. For example, an extra positively 
charged ion such as Ag* can be compensated 
for by the addition of a negatively charged elec- 
tron, whereas a vacant Ag” ion can be balanced 
by adding an electron hole (a quasiparticle 
corresponding to the absence of an electron; 
Fig. 1). In pure electron- or ion-conducting 
systems, such compensation processes are 
disfavoured or almost impossible, in most 
cases because of the lack of oppositely charged 
carriers. A solution to this problem that allows 
certain stoichiometry changes could be to com- 
bine both types of system to form a hybrid. 
Chen and colleagues’ graphite-RbAg,I, 
composite is just such a clever combination. 
The authors prepared the material by grinding 
the components together in a mortar and then 
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melting the mixture, to bring the two types of 
conductor into intimate contact with each 
other. In a series of experiments, the authors 
observed impressively fast (occurring within 
seconds) and pronounced stoichiometry 
changes (from approximately —10~ to about 
4 x 10° for silver) at the interfaces between 
the two compounds. This behaviour was com- 
bined with an extraordinarily high diffusion of 
silver ions at room temperature. 

But how do the components of the com- 
posite enable stoichiometric changes? In the 
case of silver being added to the composite, 
the extra silver is stored within the ion con- 
ductor (RbAg,I;), whereas the compensatory 
electrons are hosted by the electron conduc- 
tor (graphite). Both compounds contribute 
their best talents to this joint effort — RbAg,I, 
effectively transports silver ions and stores 
them at vacant surface sites, and graphite acts 
as an electron sponge. The situation is different 
in the case of silver being removed from the 
composite: RbAg,I, releases silver and forms 
vacancies that are compensated for by elec- 
tron holes in the graphite. Once again, the job 
is shared by the two compounds. 

Chen and co-workers also provide a detailed 
analysis of the physics behind the observed fast 


diffusion process, and show that the classical 
theory of chemical diffusion must be reconsid- 
ered in the case of interface-driven job-shar- 
ing processes. In particular, there should bea 
reassessment of the roles of chemical capaci- 
tance (a material’s ability to take up or release 
chemical components such as silver ions) and 
of electrostatic energy in changing the charge- 
carrier concentration at the interface. 

Finally, the authors built two all-solid-state 
prototype energy-storage devices — a bat- 
tery and a supercapacitor — to demonstrate 
potential practical applications of their com- 
posite. The battery can be reversibly charged 
and discharged using extremely high cur- 
rents (and therefore within 0.05 seconds), 
whereas the supercapacitor provides ultrafast 
charge release, which is needed for various 


applications of these devices. Both effects are 
a direct result of the high mass transport in the 
system and its compositional flexibility. 

It will be exciting to see whether the 
concept of an artificial mixed conductor 
system can be transferred to other solid ion 
conductors, such as the promising ‘argy- 
rodite-type solids, in which lithium ions have 
unusually high mobility*’. Another question 
is whether graphite is the optimal electron 
conductor in these mixed conductors. Per- 
haps graphene sheets, or graphite consisting 
of just a few stacked graphene sheets, could be 
used instead, to optimize the number of inter- 
faces per unit volume between the two types of 
conductor. Materials scientists, chemists and 
physicists will no doubt be keen to adopt this 
concept to create materials that have hybrid 


Nanocolumns at the 
heart of the synapse 


Ananocolumn spans the synaptic cleft between neurons, connecting regions 
of neurotransmitter molecule release and capture. This discovery informs on 
mechanisms of synaptic organization and regulation. SEE LETTER P.210 


STEPHAN J. SIGRIST & ASTRID G. PETZOLDT 


he sophisticated human brain forms 
the foundation of all the cognitive 
processes that define us as self-con- 
scious and social individuals. These processes 
are fundamentally based on the operation of 
a single functional unit — the synapse, which 
enables rapid signal transmission between 
neurons. Synapses are composed of two small, 
highly specialized compartments, one on the 
presynaptic (transmitting) side and one on 
the postsynaptic (receiving) side of the small 
gap that separates the two neurons. Structures 
spanning this synaptic cleft to coordinate 
these compartments have been suggested’, 
but direct evidence for their existence remains 
scarce. On page 210, Tang et al.” use a com- 
bination of elaborate super-resolution light 
microscopy and mathematical modelling to 
provide evidence for the existence of discrete, 
protein-based nanocolumns that connect the 
pre- and postsynaptic compartments. 
During neuronal signalling, electrical 
impulses called action potentials trigger the 
release of neurotransmitter molecules from the 
presynaptic neuron. Release involves fusion of 
neurotransmitter-containing synaptic vesicles 
with a region of the cell membrane called the 
active zone, which faces the synaptic cleft. 
Vesicle docking and fusion does not occur 
in isolation, but within an extended protein 


scaffold made up of several large multi-domain 
proteins’ that provides sites for synaptic- 
vesicle fusion. 

Dissecting the organizational princi- 
ples of these scaffolds was, for many years, 
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functionality, potentially opening up fresh 
applications. = 
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achievable only by electron microscopy, which 
is not compatible with live imaging. However, 
this limitation has been overcome, thanks to 
the development of super-resolution light- 
microscopy techniques’, which allow the 
efficient visualization of distinct protein archi- 
tectures. One such study” has revealed that 
presynaptic scaffolds physically contact synaptic 
vesicles, perhaps promoting their docking and 
priming for neurotransmitter release at defined 
fusion sites. Other studies have shown that one 
scaffold protein, RIM, has a prominent role in 
synaptic-vesicle docking — RIM interacts with 
proteins of the MUNC-13 family*” to promote 
clustering of calcium-channel proteins®, which 
in turn trigger fusion processes. 

In a quest to further decipher the nano- 
architecture of presynaptic active zones, 
Tang et al. turned to a high-resolution form 


Synaptic cleft 


Figure 1 | Architecture of a synapse. Tang et al.’ report that the synaptic connections between neurons 
are bridged by nanocolumn structures. The scaffold protein RIM is enriched in 80-nanometre-wide 
clusters at sites on the presynaptic membrane to which synaptic vesicles fuse close to calcium channel 
proteins and release neurotransmitter molecules into the synapse. On the postsynaptic neuron, sites rich 
in the scaffold protein PSD-95 contain clusters of neurotransmitter receptor proteins. RIM-rich and 


PSD-95-rich regions align to define the nanocolumn. 
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of light microscopy called stochastic optical 
reconstruction microscopy (STORM)’. The 
authors used 3D STORM to study synapses 
between in-vitro cultured mouse neurons 
derived from the brain's hippocampus region, 
which is involved in learning and memory. The 
synapses under observation release the neuro- 
transmitter glutamate. This analysis revealed 
that RIM is confined to protein nanoclusters 
of around 80 nanometres in diameter that 
lie close to the active zone. By contrast, other 
scaffold proteins and fusion factors, such as 
MUNC-13 and Bassoon, showed a more 
uniform distribution. 

Is the position of these RIM-rich nano- 
clusters related to vesicle-fusion sites? 
The researchers monitored fusion events 
at the presynaptic membrane using a 
protein-based sensor that fluoresces fol- 
lowing vesicle-membrane fusion. Mathemat- 
ical modelling of the fluorescence patterns 
revealed that fusion sites are restricted to 
particular regions of the membrane. More- 
over, a different form of super-resolution 
light microscopy called photoactivated locali- 
zation microscopy that allows live imaging, 
confirmed that RIM density increased within 
40 nm of these fusion sites. 

The authors next investigated whether 
the RIM-rich fusion sites might be coordi- 
nated with the position of the postsynaptic 
apparatus dedicated to receiving the neuro- 
transmitter signal. A sophisticated scaffold 
resides close to the postsynaptic membrane — 
a part of which is the multi-domain protein 
PSD-95, which is involved in the clustering of 
AMPA- and NMDA-type glutamate receptor 
proteins'®”’. Precise measurement of RIM and 
PSD-95 densities revealed a clear spatial cor- 
relation between the components. Tang ef al. 
therefore concluded that a nanoscale columnar 
structure spans the synaptic cleft, bring- 
ing RIM-enriched sites of synaptic-vesicle 
fusion face-to-face with postsynaptic PSD-95 
nanodomains (Fig. 1). 

Finally, Tang and colleagues asked if the 
nanocolumns could be a stable architectural 
motif or whether they are involved in the reg- 
ulatory changes in synaptic strength that are 
crucial for cognitive functions. The authors 
pharmacologically activated NMDA receptors 
to depress synaptic strength. Although there 
was no immediate change in the architecture 
of the nanocolumn, after 25 minutes a subset 
of RIM nanoclusters suddenly grew larger — 
notably only those lying opposite PSD-95 
nanodomains and residing in nanocolumns. 
Thus, retrograde signals that mediate the 
upregulation of presynaptic release in response 
to postsynaptic changes might specifically tar- 
get the scaffold proteins and release machinery 
located opposite the postsynaptic glutamate 
receptors to modulate synaptic strengthen- 
ing. As such, the nanocolumn could provide 
an important regulatory platform. 

This study generates pressing questions. For 


instance, to understand the physical nature 
of the nanocolumns, it would be interesting 
to determine what regulates their forma- 
tion. Trans-synaptic pairs of cell-adhesion 
membrane proteins are obvious candidates 
for mediating nanocolumn formation. Per- 
haps such adhesion molecules ultimately 
control the positioning and recruitment 
of RIM. 

Alternatively, diffusible signals might cross 
the cleft and specifically trigger assembly of 
nanocolumns on the scale of a few tens of 
nanometres. In addition, RIM itself could be 
involved in nanocolumn formation — RIM 
contains a central domain that binds to the 
intracellular part of calcium channels”, which 
ultimately trigger synaptic-vesicle fusion. 

In the future, the nanocolumn concept 
should be validated and extended by inves- 
tigating more proteins, including synaptic 
cell-adhesion proteins and other cytoplasmic 
scaffold proteins, and by combining imaging 
with genetic manipulation. Although the details 
of trans-synaptic coordination and the proteins 
involved might turn out to vary between syn- 
apse types and organisms, the nanocolumnar 
architectural motif could bea fundamental and 
generic building principle for synapses. m 
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A modern map of the 
human cerebral cortex 


An authoritative map of the modules that make up the cerebral cortex of the 
human brain promises to act as a springboard for greater understanding of brain 


function and disease. SEE ARTICLE P.171 


B. T. THOMAS YEO & SIMON B. EICKHOFF 


he human brain’s cerebral cortex is 

crucial for sensory and motor process- 

ing, as well as for mental functions such 
as interpreting language and logical reasoning, 
the complexity of which distinguishes us from 
other animals. On page 171, Glasser et al.' 
describe an updated map of the human cerebral 
cortex. This long-awaited advance provides a 
reference atlas that will allow those researching 
brain structure, function and connectivity to 
work within a common, systems-neuroscience 
framework. 

Regional differentiation within the cerebral 
cortex has long prompted attempts to identify 
the cortex’s distinct compartments, from clas- 
sical neuroanatomical studies at the beginning 
of the twentieth century’ to modern non- 
invasive, in vivo methods based on magnetic 
resonance imaging (MRI). Such endeavours 
are complicated by the fact that every location 
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in the brain can be described by an almost 
infinite set of features, including density of 
receptor proteins for various neurotransmit- 
ter molecules, long-range connections to other 
parts of the brain, and specialization for neural 
computations that support specific functions. 
Almost all previous studies have attempted 
to delineate cortical compartments using a 
single feature (Fig. 1). By contrast, Glasser and 
colleagues capitalize on the unprecedented 
quality and breadth of MRI data gathered by 
the Human Connectome Project, the aim of 
which is to elucidate the neural pathways that 
underlie brain function and behaviour using 
cutting-edge brain-imaging methods’. 

MRI provides unparalleled access to the 
living brain. A single MRI machine can take 
many different measurements (known as 
modalities) — from establishing the relative 
density of neuron-insulating myelin sheaths 
to determining the thickness of the cortex, 
both of which can vary sharply between 
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Task-based fMRI 


Figure 1 | Mapping function in the brain. Glasser et al.’ defined 
distinct regions in the human cerebral cortex using a combination of 
brain-mapping techniques that have previously been used only separately, 
including: task-based functional magnetic resonance imaging (fMRI), 
which informs on the functions of different regions; relative density of the 
neuron-sheathing substance myelin, which provides information about 
cortical architecture; and resting-state fMRI, which details neural 


cortical areas. Furthermore, functional MRI 
(fMRI) can measure the changes in blood 
flow that accompany mental tasks, as well as 
whole-brain activity in resting states, provid- 
ing information about regional neural activ- 
ity that accompanies different brain states. 
The authors’ integration of information 
from several MRI modalities not only moves 
this work closer than previous attempts to 
the classical definition of a cortical area, but 
also has several key advantages over other 
investigations. 

First, some modalities reveal borders not 
clearly reflected in others. For instance, the 
border between areas 3a and 3b of the somato- 
sensory cortex (which processes information 
about touch and pain) is easily delineated by 
myelin mapping, but not by resting-state fMRI. 
As another example, Glasser et al. developed a 
resting-state {MRI technique that maps topo- 
graphic neural connectivity within the visual 
cortex. The sharp transition between levels of 
topographic connectivity across area bounda- 
ries allows much clearer delineation of discrete 
areas involved in early stages of visual pro- 
cessing than do myelin maps or conventional 
resting-state {MRI approaches*”. 

Second, convergence across different MRI 
modalities reduces the likelihood of mis- 
defining borders as a result of feature-specific 
noise or bias. This is important, given the indi- 
rect nature of most modalities — for example, 
fMRI measures the blood-flow changes that 
accompany neuronal activity as a proxy for 
neuronal activity itself, Consequently, complex 
computational pre-processing is often neces- 
sary to differentiate signal from noise. Agree- 
ment across modalities increases confidence 
that borders reflect biological reality rather 
than measurement biases. 

Finally, an integrative approach better 
equips researchers to describe the proper- 
ties of each area, as exemplified by Glasser 
and colleagues’ comprehensive supplemen- 
tary material. The authors find, for instance, 
that a cortical area characterized in the 


Myelin mapping 


Resting-state fMRI 


from ref. 1.) 


1950s by its low myelin content® seems to be 
involved in language processing as measured 
by task-based fMRI — a finding consistent 
with a recent meta-analysis of more than 
10,000 imaging experiments across 83 behav- 
ioural tasks’. Therefore, Glasser and col- 
leagues’ map represents the convergence of 
decades of classical neuroanatomical studies 
with modern non-invasive studies. 

In contrast to the burgeoning field of 
resting-state {MRI mapping, which has largely 
focused on fully automatic approaches to 
divide the brain into parcels that have homo- 
geneous connectivity patterns®, Glasser and 
colleagues used a semi-automatic approach 
that explicitly incorporates prior knowledge 
from neuroanatomical studies to define the 
borders in their map. This inclusion repre- 
sents a crucial and long overdue advance 
over agnostic, exclusively computational 
approaches. However, using prior knowledge 
to choose which modalities to trust in cases 
of conflicting evidence entails the danger of 
introducing confirmatory biases. Moreover, 
it could result in differential mapping quality 
between areas in which there is relevant, well- 
known information — such as the somato- 
sensory and visual cortices — and those for 
which less knowledge exists, such as the pre- 
frontal and parietal cortices. The latter pair is 
of particular interest to many neuroscientists, 
because these areas compute most functions 
that are specific to humans. Indeed, given that 
the authors explicitly ignore certain modality 
information for their data set that is function- 
ally meaningful but fractionates classical corti- 
cal areas, further investigation will be crucial 
to understand how borders that are strongly 
demarcated in only one modality can be dif- 
ferentiated from modality-specific noise. 

Ona related theme, although Glasser et al. 
have delineated 360 cortical areas, these 
regions could potentially be subdivided into 
smaller, more-uniform units that are less 
distinct from each other. For example, dif- 
ferent portions of the somatosensory cortex 
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connectivity within and between different regions. In each of these 

three panels, colours provide a heat map of the measurements. The result 

is a map that delineates 360 distinct cortical areas. Different colours 

represent how connected each area is to sensory inputs (hearing, red; touch, 
green; vision, blue) and to systems involved in cognition (light and dark). 
Mixed colours show areas in which functional systems overlap. (Images taken 


that represent distinct body parts might be 
considered as distinct computational units. 
Furthermore, examples of new areas being 
defined with the advent of more-sensitive or 
complementary methods are commonplace’. 
As such, it remains unclear what the ‘optimal’ 
number of areas to be defined is — let alone 
the ‘correct’ number. We suspect that the 
optimal number might be application-depend- 
ent. The authors’ work, although seminal, 
will therefore probably not be the final word 
on this topic. 

A key innovation in the current study is an 
automatic algorithm that seeks to delineate 
cortical areas in individual human subjects, 
a much more complex task than producing a 
map of the average brain. Previous work has 
attempted to estimate, in individual subjects, 
10-20 functional networks (for example, see 
ref. 10), but Glasser and colleagues’ goal of 
delineating 360 areas is more ambitious. Cap- 
turing inter-individual biological variability 
and differentiating such variability from meas- 
urement noise is essential to understand the 
relationship between brain organization and 
individual differences in behaviour, as well as 
for clinical applications. 

The authors’ validation of this algorithm 
focused on only a small portion of the cortex, 
so further investigation will be crucial. Nev- 
ertheless, their work represents a major step 
towards individual-specific ‘biomarkers’ of 
brain dysfunction, because individual-specific 
quantities of each area, such as grey-matter 
volume or connectional strength to other 
areas, can now be computed, and could be 
strongly predictive of individual differences 
in behaviour or disease. 

Glasser and co-workers’ atlas is the first 
multimodal map targeted at defining corti- 
cal areas, and therefore represents a major 
advance in human brain mapping. It is now 
up to researchers to use the anatomical frame- 
work provided, compare it with alternative 
approaches to mapping the human brain, and 
populate the defined areas with functional 
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and disease-related information. By doing so, 
we can begin to integrate multimodal data to 
understand how individual differences in brain 
organization can explain differences in function, 
behaviour and disorder. m 
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Endothelial-cell killing 
promotes metastasis 


To migrate into the lungs, cancer cells in the bloodstream must cross the lung’s 
endothelial-cell barrier. A study shows that cancer cells can achieve this feat by 
signalling to induce endothelial-cell death. SEE LETTER P.215 


CLAUDIO R. ALARCON & SOHAIL F. TAVAZOIE 


ancer cells often migrate from where 

the cancer initially formed, to colonize 

other parts of the body in a process 
called metastasis, which is associated with 
poor clinical prognosis. On page 215, Strilic 
et al.' uncover a surprising mechanism that 
migrating cancer cells in the bloodstream use 
to cross the lung’s barrier of endothelial cells. 
The authors show that cancer cells send a sig- 
nal that makes endothelial cells undergo a type 
of cell-death program called necroptosis (also 
known as programmed necrosis). Once in the 
lung, the cancer cells form lethal metastatic 
colonies. 

The past three decades have provided 
increasing evidence that supports the key roles 
of endothelial cells in the formation and pro- 
gression of tumours to a malignant state that 
has a poor prognosis for the patient**. Tumour 
cells rely heavily on the endothelial cells of 
blood vessels to enable continued tumour 
growth, because tumours need blood vessels 
to obtain oxygen and nutrients and expel meta- 
bolic waste. 

Tumour cells exploit and manipulate 
endothelial cells by using intricate signalling 
mechanisms, such as those involving protein 
factors, secreted by tumours, that attract and 
remodel endothelial cells. Remodelling of 
blood vessels by tumour-derived proteins 
can enable cancer cells that reside in the pri- 
mary site of tumour growth to enter the blood 
circulation, providing an escape route for the 
cells to reach distant organs”®. After entering 
the bloodstream, cancer cells must cross the 


endothelial barriers that prevent them from 
entering other organs (Fig. la). In certain tis- 
sues, such as the lung or brain, the interface 
between the tissue and the bloodstream is 
relatively impenetrable to tumour cells®, 
Strilic and colleagues’ work began with 
an observation made when tumour cells 
and endothelial cells were cultured together 
in vitro. The researchers noted that such co- 
culture leads to an increase in endothelial- 
cell death. However, rather than exhibiting 
the typical cell-shape changes and molecular 
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features of apoptosis, the most common form 
of programmed cell death, the dying endo- 
thelial cells exhibited features associated with 
another cell-death program called necroptosis. 
For example, the dying cells exhibited compro- 
mised cell-membrane integrity, as monitored 
by dye uptake. 

To confirm the observed cell-death 
mechanism, the authors inhibited proteins 
that mediate necroptosis’, and found that this 
inhibited tumour-induced endothelial-cell 
death, whereas perturbing apoptotic signalling 
did not. They found that this necroptotic cell- 
death program was activated in both human 
and mouse endothelial cells exposed to a wide 
variety of cancer cell lines. Moreover, intra- 
venous injection of mouse melanoma skin- 
cancer or lung-cancer cells into mice caused 
lung endothelial cells to undergo necroptotic 
death. 

What advantage does killing endothelial 
cells afford tumour cells? Strilic and colleagues 
carried out in vitro experiments in which they 
inhibited necroptosis and observed reduced 
tumour-cell migration across an endothelial- 
cell monolayer, leading the authors to propose 


Figure 1 | Tumour cells migrate into tissues by killing cells that block their entry. a, Strilic et al.’ 

show that tumour cells can induce necroptotic cell death of blood-vessel endothelial cells, which enables 
migrating (metastatic) cancer cells in the bloodstream to cross the endothelial-cell barrier and enter the 
adjacent tissue to colonize a new tumour site. b, Necroptotic endothelial-cell death is induced by amyloid 
precursor protein (APP) on the tumour surface, which interacts with the death receptor 6 protein (DR6) 
on endothelial cells. Tamour-cell migration from the bloodstream into the adjacent tissue may be 
enhanced either directly, as a consequence of endothelial-cell death and the resulting disruption of the 
endothelial barrier, or indirectly because of the release of damage-associated molecular pattern molecules 
(DAMPs) from dying endothelial cells that could open the endothelial barrier between cells or enhance 


tumour migration properties. 
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that tumour-induced necroptosis enhanced 
tumour-cell migration across the endothelial 
barrier. The authors made similar findings in 
in vivo experiments using genetic inactivation 
of the RIPK3 kinase enzyme, a necroptosis 
regulator, in endothelial cells. Inactivation of 
RIPK3 prevented endothelial-cell death, and 
reduced the ability of cancer cells to cross the 
endothelial barrier and enter the lung. Meta- 
stastic tumour-colony formation was reduced 
upon genetic or pharmacological inhibition 
of endothelial-cell necroptosis, indicating 
that tumour-induced endothelial-cell killing 
exerted control over metastasis. 

How does endothelial-cell death enhance 
tumour-cell migration across the endothelial 
barrier? Strilic and colleagues propose vari- 
ous mechanisms. Tumour cells could migrate 
through gaps left in the endothelial barrier by 
dead endothelial cells. Another possibility is 
that damage-associated molecular pattern 
molecules (DAMPs), such as ATP released 
from necroptotic endothelial cells, could act 
on neighbouring endothelial cells to open the 
endothelial barrier by enabling tumour-cell 
migration between neighbouring endothelial 
cells that are usually bound together to form 
an impermeable barrier, and/or these signals 
could act directly on tumour cells to enhance 
their migration across the barrier’. 

How do tumour cells induce endothelial 
necroptosis? The authors used a combination 
of molecular, pharmacological and genetic 
approaches to show that amyloid precursor 
protein (APP) on the surface of tumour cells 
induces necroptotic cell death by interacting 
with death receptor 6 (DR6) on endothelial 
cells (Fig. 1b). Consistent with this, phar- 
macological inhibition of DR6 signalling 
— achieved by injecting mice with a ‘decoy’ 
version of the DR6 receptor — inhibited 
metastasis. 

Strilic et al. provide compelling evidence 
to support the existence of intricate signal- 
ling interactions between migrating tumour 
cells in the bloodstream and the blood-vessel 
endothelium that promote tumour-cell meta- 
static migration into tissue and subsequent 
tissue colonization. These findings raise a 
series of intriguing issues. Only a small frac- 
tion of endothelial cells cultured in vitro with 
tumour cells are induced to undergo necrop- 
totic death. Discovering the molecular deter- 
minant that governs which endothelial cells die 
is a key challenge. The authors reveal that only 
approximately 10% of endothelial cells express 
DR6 and are thus susceptible to APP-mediated 
cell death. 

It will be important to understand the 
mechanisms that regulate which fraction of 
endothelial cells express DR6, and whether 
cancer cells can regulate the susceptibi- 
lity of endothelial cells to necroptosis by 
modulating DR6 expression on the cells. 
Microscopy analysis of human tumours 
could be used to reveal whether an increased 


fraction of DR6-expressing endothelial cells is 
associated with the propensity for lung meta- 
static progression. Perhaps molecular sig- 
nals from the endothelium to tumour cells 
regulate expression or cleavage of APP on 
tumour cells — thus having an effect on endo- 
thelial-cell necroptosis. Such endothelial-cell- 
derived signals have roles in epithelial-cell fate 
and function’. 

In addition to the mechanisms proposed 
by the authors, another mechanism by which 
endothelial-cell necroptosis might enhance 
tumour migration into tissue could be medi- 
ated by ATP. Release of ATP from dying 
endothelial cells might promote the survival 
of tumour cells during their migration through 
the endothelial barrier into the tissue’ — a 
process that can cause traumatic tumour-cell 
deformation and death. Live-cell microscopy 
imaging of tumour- and endothelial-cell 
dynamics during this interaction'' may be 
an ideal means of determining which of the 
intriguing potential cellular mechanisms 
proposed by the authors might underlie 
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tumour-cell migration across the endothelial 
barrier. m 
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The TORCI pathway to 
protein destruction 


A study of the proteasome — a protein- degradation complex — reveals an 
evolutionarily conserved pathway that acts through the protein kinase TORC1 to 
adjust proteasome levels in response to cellular needs. SEE ARTICLE P.184 


LYNNE CHANTRANUPONG & 
DAVID M. SABATINI 


Oo maintain amino-acid and protein 
levels, cells must couple nutrient avail- 
ability to protein synthesis and turnover. 
Central to this process is the enzyme called 
target of rapamycin complex 1 (TORC1) kinase, 
a master growth controller that integrates 
diverse environmental inputs to coordinate 
many metabolic processes’. Rousseau and 
Bertolotti* reveal on page 184 that inhibi- 
tion of TORC1 increases levels of the protea- 
some — a large protein complex involved in 
cellular protein degradation — to promote cell 
survival under stressful conditions. Consistent 
with previous reports’ °, the new work identi- 
fies TORC]1 as a central regulator of protea- 
some homeostasis. However, the relationship 
between TORC] and the control of proteasome 
function seems to be complex, because TORC1 
can regulate the proteasome through multiple 
mechanisms that depend on the particular 
cellular context*>. 
The proteasome functions in one of the 
main protein-degradation pathways in cells, 


the ubiquitin-proteasome system’. In this 
pathway, a multi-enzymatic cascade covalently 
links the small polypeptide ubiquitin to pro- 
teins. This modification is recognized by the 
proteasome, which degrades ubiquitinated 
proteins to produce peptide mixtures that 
can then replenish the intracellular pool of 
amino acids®. 

The proteasome comprises a multi- 
subunit core particle, which carries out 
protein degradation, and up to two addi- 
tional regulatory particle components that 
facilitate substrate recognition, removal of 
ubiquitin, and protein unfolding and trans- 
location into the proteasome®. Inhibition of 
the proteasome results in a lethal shortage 
of amino acids’; therefore cells must main- 
tain adequate proteasome levels to survive. 
However, the mechanisms that govern the 
assembly and regulation of this complex 
molecular machine, particularly under stress- 
ful conditions, are not fully understood. 

The discovery* in yeast of Adc17, a stress- 
induced regulatory particle assembly chap- 
erone protein (RAC), offers an insight into 
the mechanism of proteasome regulation. 
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Rousseau and Bertolotti used Adc17 as 
a starting point to investigate the pro- 
teasome. They treated yeast with the 
antibiotic tunicamycin to induce the 
unfolded-protein response, a cellular 
stress response to the presence of mis- 
folded or unfolded proteins. They found 
that yeast upregulates Adc17 levels in the 
presence of tunicamycin, and that loss 
of the protein Sfp1, a negative regula- 
tor of TORC1, abrogates this increase 
of Adc17. The authors established that 
the increase in Adc17 requires inhibi- 
tion of TORC1. Pharmacological sup- 
pression of TORC1 by the compound 
rapamycin or genetic inhibition of 
TORCI1 by inactivation of KOG1, 
which encodes an essential TORC1 
subunit, are sufficient to increase the 
expression not only of Adc17, but 
also of all other known RACs and 
of proteasome subunits. 

To understand how TORC1 might 
mediate an increase in proteasome abun- 
dance, the authors focused on Mpkl, a 
yeast enzyme known as a mitogen-acti- 
vated protein kinase (MAPK) that func- 
tions downstream of TORCI, and which 
is essential for the survival of cells in 
which tunicamycin has induced a stress- 
ful increase of unfolded proteins. Rous- 
seau and Bertolotti found that Mpk1 
is required for the TORC1-mediated 
increase in RACs and proteasome subu- 
nits (Fig. la). Neither the abundance 
of their messenger RNAs nor the pro- 
tein stability of these RACs and proteasome 
subunits was altered in response to rapamy- 
cin-mediated inhibition of TORC1, which 
indicates that the increased levels of these 
proteins probably occur though regulation of 
mRNA translation. 

An enhanced proteasomal capacity 
enables cells to adapt to the rising demand for 
protein degradation that accompanies stress. 
The absence of proteasome induction, as tested 
by Rousseau and Bertolotti using cells in which 
the gene for Mpk1 had been deleted, severely 
impairs the clearance of ubiquitinated proteins 
and of well-characterized reporter substrates 
used to monitor proteasomal activity. 

The authors found that in mammalian cells, 
ERKS, the mammalian equivalent of Mpk1, 
also facilitates a rapid rise in RAC and protea- 
some levels when mTORC1 (the mammalian 
equivalent of TORC1) is inhibited (Fig. 1b). 
Thus, the TORC1 and Mpk1 pathway is an 
evolutionarily conserved regulator of protea- 
somal homeostasis. 

Rousseau and Bertolotti’s work contrib- 
utes an additional perspective to the current 
debate about the exact relationship between 
TORC1/mTORC1 and the regulation of pro- 
teasome function. Consistent with the model 
proposed by Rousseau and Bertolotti, a study 
by Zhao et al.* found that acute pharmaco- 


Figure 1 | Evolutionarily conserved regulation of 
proteasome abundance. a, Rousseau and Bertolotti’ report 
that activation of the yeast mitogen-activated protein kinase 
enzyme (MAPK) known as Mpk1 mediates an increase in 
the levels of regulatory particle assembly chaperone proteins 
(RACs) and subunits required for the formation of the 
proteasome complexes that mediate protein degradation. 
Mpk1 is activated by inhibition of TORC] protein kinase 
activity. TORC1 can be inhibited by tunicamycin or rapamycin 
treatment or by the action of the protein Sfp1. b, The authors 
also show that this process of proteasomal regulation is 
conserved in mammals — ERK5, the mammalian protein 
kinase most similar to Mpk1, is required to upregulate 
mammalian proteasome levels through an increase in RACs 
and proteasome subunits upon mTORC1 inhibition by 
compounds such as rapamycin or by nutrient starvation. 
In both yeast and mammals, an increase in proteasome 
abundance is necessary for cell survival under stress. 


logical inhibition of mTORC1 in the HEK293 
mammalian cell line upregulates protein 
degradation by the proteasome. However, a 
report by Zhang et al.’ reveals nuances in the 
regulation of the proteasome by mTORCI1, 
and finds that in the absence of the protein 
TSC2, a major inhibitor of the mTORCI path- 
way, the transcription factor NRF1 mediates 
an increase in levels of the proteasome and of 
intracellular amino acids. 

The differences between these three 
studies’ probably arise from variations in 
the extent to which mTORC1 is perturbed. 
Under acute mTORC1 inhibition, as stud- 
ied by Rousseau and Bertolotti* and Zhao 
et al.*, upregulation of the proteasome would 
increase amino-acid pools and permit the 
translation of proteins necessary for sur- 
vival. mTORC1 inhibition induces autophagy, 
another major intracellular protein-degrada- 
tion pathway that removes proteins in bulk 
from the cytoplasm and delivers them to an 
organelle called the lysosome for breakdown’. 
In combination, the rapid and coordinated 
activation of both the autophagic and prot- 
easomal arms of protein degradation would 
be beneficial to cells as a mechanism to 
increase amino-acid levels under stress or 
nutrient deprivation. 

However, under states of prolonged mTORC1 
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hyperactivation — for example, when 
TSC2 is lost, as investigated by Zhang 


a Yeast b Mammals ) 
aes et al.* — cells may also need to increase 
Tunicamycin Rapamycin starvation — Rapamycin proteasomal capacity to counteract unre- 
ee Wee strained consumption of resources driven 
e=-@ (=) by sustained mTORC1 activity. It will be 
Sfpl_ TORC1 mTORC1 informative to compare the regulation 
ll Il of the proteasome in genetic models in 
at oh which mTORC1 is constitutively active 
i | but not hyperactivated — for example, in 
TRACs TRACs mice that have a constitutively active Rag 
a e GTPase enzyme’. 
Tren Rear How TORCI1 inhibition increases 
proteasome-dependent degradation 
TProteasome TProteasome is another question requiring further 
ae Gta investigation. Rousseau and Bertolotti 
Cellisurvival Calleurvival found that this upregulated proteolysis 


depends on elevated proteasome levels, 
whereas the study by Zhao et al.* found 
that enhanced ubiquitination drives 
protein breakdown without a change in 
proteasome content or activity. It will 
also be of interest to determine whether 
specific proteins are preferentially tar- 
geted for proteasomal degradation when 
TORC1 is inhibited. Consistent with this 
possibility, Zhao et al.* found evidence 
for the selective proteasomal breakdown 
of growth-related proteins. Finally, given 
the integral link between ubiquitination 
and the proteasome, it is probable that 
both systems are concomitantly regu- 
lated under stress. The identification 
of enzymes called ubiquitin ligases and 
deubiquitinases, which are necessary to 
target substrates specifically to the pro- 
teasome, may provide a way to address this 
question. 

From all these studies”“, it is clear that the 
TORC1/mTORC1 pathway is a central regu- 
lator of proteasome homeostasis. It will be 
necessary to resolve the differences in cur- 
rent models of how this pathway affects the 
proteasome, especially given that modula- 
tion of the proteasome might be a therapeutic 
approach for diseases such as cancer and 
neurodegeneration. m 
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Fat and the fate of 
pancreatic tumours 


In obese people with pancreatic cancer, the many interactions between fat 
cells and the inflammatory microenvironment surrounding the tumour lead to 
below-average prognosis and chemotherapy outcome. 


MELEK CANAN ARKAN 


he increasing prevalence of obesity 

will have an even greater effect on the 

health-care system than previously 
predicted, because obesity turns out to be a 
major risk factor for the development of can- 
cer’. Obese individuals have a substantially 
elevated risk for a type of pancreatic cancer 
known as pancreatic ductal adenocarcinoma, 
which is the fourth most-common cause of 
cancer-associated death’. An inflammatory 
microenvironment is a hallmark of cancer, 
but little is known about how alterations in 
the surrounding connective tissue (stroma) 
contribute to tumour initiation and progres- 
sion in obesity. Writing in Cancer Discovery, 
Incio et al.’ report their investigation into how 
fat cells in the microenvironment surround- 
ing cancer cells contribute to tumour initiation 
and progression in both mice and humans. 

Tumour formation in the pancreas involves 
striking structural distortion of tissue, which 
is attributed to the disruption of digestive- 
enzyme-containing acinar cells, tissue infiltra- 
tion by immune cells, a strong fibrotic response 
(also known as fibrosis, the formation of excess 
connective tissue or collagen protein around 
the tumour), and a higher than usual level of 
deposition of extracellular-matrix material. 
Cancer lesions in obese individuals are com- 
monly associated with increased fat-cell (adi- 
pocyte) content compared with tumours from 
non-obese patients; however, the function of 
these fat cells in pancreatic cancer remained 
unclear until now. 

Incio and colleagues show that, in mice, 
adipocytes, along with immune cells and 
pancreatic stellate cells, signal through the 
IL-1 protein and the AT1 angiotensin recep- 
tor to drive migration of immune cells called 
neutrophils to the tumour microenvironment. 
This increases the inflammatory and fibrotic 
response in the pancreatic-cancer microenvi- 
ronment in a way that results in poor response 
to chemotherapy and poor prognosis. 

In obese mice, the tumour microenviron- 
ment was shown to contain adipocytes that are 
increased in both size and number, partly as 
a result of tumours invading the neighbour- 
ing white adipose tissues. The researchers 
observed an abundant fibrotic response in 


tumour areas that were enriched in adipo- 
cytes or located adjacent to adipose tissue. 
These results suggest that fibrosis is a hall- 
mark of adipose tissue in obese subjects with 
pancreatic cancer, and that the accumulation 
of the extracellular-matrix protein collagen, 
a component of the fibrotic response, in the 
vicinity of fat cells is a prominent characteristic 
of obesity. Incio and colleagues also found that 
adipocyte infiltration into the tumour micro- 
environment correlates with worse prognosis 
and treatment outcome in patients. 

The authors hypothesized that, in people 
with pancreatic cancer, obesity-associated 
adipocyte accumulation increases fibrosis, 
promotes tumour progression and hinders the 
delivery and efficacy of chemotherapeutics. 
When they checked the percentage of perfused 
blood vessels in a given area of mouse tumour, 
they found that it was significantly reduced in 
obese animals. To determine whether impeded 
perfusion through blood vessels is responsi- 
ble for inefficient delivery of chemotherapeu- 
tic agents, the authors measured the uptake 
of the chemotherapy drug 5-fluorouracil in 
mice. Obesity significantly decreased tumour 
uptake of the drug compared with uptake in 
non-obese control animals, thereby reducing 
the chemotherapy’s efficacy (Fig. 1). 

Chronic fibrosis is thought to have a crucial 
role in enhancing tumour growth and in atten- 
uating drug delivery. However, in previous 
studies, inhibition of chronic fibrosis by either 
inhibitor compounds’ or genetic mutations” 
resulted in increased immunosuppression, 
accelerated tumour growth and decreased 
survival, implying that tumour stroma may be 
restrictive to tumour growth. 

By contrast, Incio et al. show that inhibition 
of the major pro-fibrotic pathway of AT1 
signalling in mice inhibited tumour progres- 
sion. The authors propose that migration into 
the tissue of tumour-associated neutrophils 
and IL-1 production are leading drivers in 
the regulation of tumour growth in this con- 
text, although changes in vascular perfusion 
due to reduced blood pressure also play a 
minor part. When the authors depleted neu- 
trophils or blocked the activity of IL-16 using 
antibody treatment, the immunosuppressive 
microenvironment was reshaped and the pro- 
gression of pancreatic cancer was reduced. 
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50 Years Ago 


“We wuz robbed’ — The World Cup 
which has recently been enacted 

in Britain may have been fun to 
watch, but there is no question that 
it was a thoroughly badly designed 
experiment ... The mere fact that a 
Poisson distribution can describe 
so well the distribution of scores by 
individual teams goes a long way to 
suggest that the teams were much 
of a muchness in talent and their 
scores were independent of each 
other. From this point of view, the 
decision that the outcome of the 
whole competition should depend 
on the outcome of a single game 
between the two so-called finalists 
was as much ofa farce as a great 
many West German supporters 
already know it to have been... 

If, for example, it were agreed that 
... no team should be declared the 
winner until its score exceeds that 
of its opponent by three standard 
deviations of Poisson distribution, 
it might be necessary to design the 
game of football so that it would be 
practicable for one side to score 100 
goals or so... Such a change could 
easily be brought about, possibly 
by widening the goalposts or by 
abolishing goalkeepers. 

From Nature 13 August 1966 


100 Years Ago 


The History of the Family. By Prof. 
W. Goodsell — In what sense is it 
right to speak of the history of the 
family? ... Can it be said to have a 
history? ... Some such questions 
as these arise in one’s mind as one 
takes up Prof. Goodsell’s book... 
even a casual reader will be struck 
by a want of precise references in 
certain of the chapters ... Where 
is the “weight of evidence” which 
shows that polygamy is unpopular 
among savage women? The 
author gives several reasons why 
we condemn it, but there is surely 
room for doubt... 

From Nature 10 August 1916 
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Figure 1 | Fat cells remodel the microenvironment around 

tumours. Tumours are perfused with blood vessels, which allow 
chemotherapy drugs to enter. Incio et al.’ report that, in the context of 
obesity, access to pancreatic tumours is restricted by poor tumour blood- 
vessel perfusion, leading to a decreased response by tumour cells to 
chemotherapeutic drugs. In obesity, there is an increase in pancreatic stellate 
cells, immune cells such as neutrophils and IL-16 signalling molecules, as 


When experimentally targeting AT1 signal- 
ling in mice, other processes downstream of 
AT1 signalling — such as the epithelial-to- 
mesenchymal cell transition or adipocyte dif- 
ferentiation — might also be affected, and it is 
possible that these processes are responsible 
for the decrease in obesity-associated tumour 
progression. 

Even though the authors correlated fibrotic 
response with tumour size, it is difficult to 
judge whether the fibrotic response or tumour 
growth comes first, because decreased tumour 
progression will eventually result in decreased 
immune-cell infiltration into the tumour 
microenvironment and decreased fibrosis. 
Either way, the results of this study and oth- 
ers*“ reinforce the need for further evaluation 
of the functional contribution of fibrosis in the 
initiation and progression of pancreatic cancer, 
especially in obesity. 

Cellular alterations induced by mechanical 
forces are becoming more widely recognized as 
having arole in various diseases’. Homeostasis 
in the balance between internal and external 
forces on cells (the state of physical tension 
known as tensional homeostasis) can regulate 
apoptotic cell death, cell proliferation, adhe- 
sion and migration, and its deregulation could 
result in increased susceptibility to cancer. 
In addition, physical cues from the pressure 
exerted by solid-tissue components of the 
tumour microenvironment can compress 
blood vessels, causing poor tumour perfusion’. 

Incio and colleagues showed that treatment 
with the AT1 blocker losartan can reduce 
mechanical stress on cells and decrease tumour 
growth in mice with pancreatic cancer. More 


Tumour cell 


Adipocyte 


research is needed to investigate the type of 
transcriptional switch that tensional homeo- 
stasis induces in the dense cellular milieu of 
the tumour microenvironment in obesity. 
Inhibition of the mechanical forces acting on 
cells in pancreatic cancer may provide further 
clues for future clinical treatment. Normalizing 
the tumour extracellular matrix by reducing 
matrix stiffness may be more effective and 
safer than trying to delete stromal components 
directly. 

The relationship between fat cells and stem- 
cell regulation is another key question. Mature 
white adipocytes — fat stores that control 
energy metabolism — respond to nutritional 
and hormonal cues through the secretion of 
signalling proteins. Studies point to white adi- 
pose fat having functions in tissue regenera- 
tion and stem-cell regulation, placing fat cells 
at the centre of multiple aspects of cancer pro- 
gression’. Mesenchymal stem cells or stromal 
stem cells contribute substantially to adipo- 
cyte generation. Mechanical stress is also a 
trigger for the expansion of some stem-cell 
populations®. 

It would be interesting to determine the 
origin of adipocytes in the pancreas, their fate 
and phenotype (whether the cells form white 
or brown fat, as well as the type of component 
they secrete). Other potential topics for inves- 
tigation include studying the contribution of 
adipocyte-derived signals that recruit and pos- 
sibly polarize immune-cell differentiation; the 
function of adipocyte invasion during tumour 
formation; and the role of malfunctioning 
energy metabolism in obesity. 

Incio and colleagues’ work provides a 
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well as larger fat cells (adipocytes).The denser cellular microenvironment 
seen in obesity puts extra mechanical tension on the tissue and may restrict 
blood-vessel perfusion. This mechanical tension arises because of the 
signalling crosstalk between adipocytes, neutrophils, pancreatic stellate cells 
and other components of the tissue microenvironment. This crosstalk leads 
to an increase in inflammatory cells such as neutrophils and excess fibrous 
connective tissue in the vicinity of the tumour. 


plausible cellular and molecular explanation 
for increased adipocyte interaction with the 
cells of the pro-inflammatory and pro-fibrotic 
tumour microenvironment that accelerates 
disease progression and hampers therapy. Is 
systemically targeting tumour-associated neu- 
trophils, pancreatic stellate cells or adipocytes 
feasible without causing collateral damage to 
host functions? Such damage could be a major 
challenge to successful direct translation of 
the current findings to the clinic. Only time 
will tell whether targeting IL-1 or neutro- 
phils could offer opportunities for successful 
therapeutic intervention. Of course, the best 
preventive approach in the meantime is to eat 
a healthy diet and to exercise. m 


Melek Canan Arkan is at the Institute of 
Biochemistry II, Goethe University, Frankfurt 
60590, and the Institute for Tumor Biology 
and Experimental Therapy, Georg-Speyer 
Haus, Frankfurt 60596, Germany. 

e-mail: arkan@med.uni-frankfurt.de 


1. Giovannucci, E. & Michaud, D. Gastroenterology 
132, 2208-2225 (2007). 

2. Incio, J. et al. Cancer Discov. http://dx.doi. 

org/10.1158/2159-8290.CD-15-1177 (2016). 

3. Rhim, A. D. et a/. Cancer Cell 25, 735-747 (2014). 

4. Ozdemir, B.C. etal. Cancer Cell 25, 719-734 

(2014). 

5. DuFort, C. C., Paszek, M. J. & Weaver, V. M. Nature 

Rev. Mol. Cell Biol. 12, 308-319 (2011). 

6. Provenzano, P. P. & Hingorani, S. R. Br J. Cancer 

108, 1-8 (2013). 

7. Shook, B. et al. Annu. Rev. Cell Dev. Biol. 

http://dx.doi.org/10.1146/annurev- 
cellbio-111315-125426 (2016). 

8. Guilak, F. et al. Cell Stem Cell 5, 17-26 (2009). 


This article was published online on 3 August 2016. 


ARTICLE 


doi:10.1038/nature19078 


Synergistic, ultrafast mass storage and 
removal in artificial mixed conductors 


Chia-Chin Chen!, Lijun Fu! & Joachim Maier! 


Mixed conductors—single phases that conduct electronically and ionically—enable stoichiometric variations in a material 
and, therefore, mass storage and redistribution, for example, in battery electrodes. We have considered how such 
properties may be achieved synergistically in solid two-phase systems, forming artificial mixed conductors. Previously 
investigated composites suffered from poor kinetics and did not allow for a clear determination of such stoichiometric 
variations. Here we show, using electrochemical and chemical methods, that a melt-processed composite of the ‘super- 
ionic’ conductor RbAg,l; and the electronic conductor graphite exhibits both a remarkable silver excess and a silver 
deficiency, similar to those found in single-phase mixed conductors, even though such behaviour is not possible in 
the individual phases. Furthermore, the kinetics of silver uptake and release is very fast. Evaluating the upper limit set 
by interfacial ambipolar diffusion reveals chemical diffusion coefficients that are even higher than those achieved for 
sodium chloride in bulk liquid water. These results could potentially stimulate systematic research into powerful, even 


mesoscopic, artificial mixed conductors. 


Mixed conductors form an important class of functional solids. They 
are relevant as prototype solids, of which purely electronic conductors 
and purely ionic conductors are special subcases, and allow for rapidly 
transducing chemical signals and permeating chemical components’ *. 
Therefore, they are vital in a technological context for use as electrodes, 
permeation membranes, sensors and catalysts. In the field of solid-state 
chemistry, mixed conductors are important because they enable rapid 
solid-state reactions; in solid-state physics, they have become popular 
in conjunction with the advent of high-temperature superconductivity. 

Mixed conductors are characterized by a key thermodynamic 
parameter and a key kinetic parameter. The former is the chemical 
capacitance (C°, indicating a capacitance enabled by non- 
stoichiometry), which measures the ability of the conductor to 
take up or release chemical components such as oxygen, hydrogen, 
lithium and silver (Supplementary Information section I). The latter 
is the chemical diffusion coefficient (D°), which measures, for a given 
geometry and driving force, the rate of such processes”, and represents 
the most important parameter in the field of chemical kinetics of solids. 
Moreover, it is typically the decisive quantity in battery research, when 
referring to practical energy densities. In addition to the chemical 
capacitance, D® also depends on the chemical resistance (R°), 
which itself is composed of contributions from ionic and electronic 
conductivities and, as such, determines the permeation rate of a 
component in steady- state®. 

A characteristic feature of mixed conductors is their ability to 
exhibit varied stoichiometry, as observed in, for example, intercala- 
tion electrodes for Li- or Na-based batteries’, permeation membranes’, 
chemical sensors’, high-temperature superconductors"® and materials 
for resistive switching'!!*. Examples of previously investigated mixed 
conductors are silver chalcogenides (Ago Y, Y =S, Se, Te; refs 13-15). In 
these examples, silver excess (Ag) +-Y, with ¢ > 0) and silver deficiency 
(€ <0) is realizable. Mechanistically, the former is achieved by incor- 
porating silver ions on interstitial sites that are compensated by 
excess electrons and the latter by forming silver ion vacancies that are 
compensated by electron holes; the kinetics occurs via motion of ionic 
and electronic carriers coupled through charge conservation. Such 
stoichiometry variations occur rapidly only at elevated temperatures, 


especially for the highly conducting high-temperature phases'*~*!. In 
pure ionic conductors—such as purely Ag*-conducting silver halides or 
the ‘super-ionic’ conductor RbAg,I;—and pure electronic conductors, 
perceptible stoichiometry changes are not possible, owing to the lack 
of one necessary carrier. This is not just a kinetic issue—in RbAguls, 
for example, the high electronic energy forbids the accommodation of 
a large number of excess electrons or electron holes”. 

Owing to the lack of mixed conductors with high electronic and 
ionic conductivities at room temperature, it is tempting to create 
‘artificial or heterogeneous mixed conductors by fabricating com- 
posites consisting of compatible phases that are highly ionically and 
electronically conducting. Fast steady-state transport (of a component 
such as silver or oxygen) can be realized in a straightforward manner in 
bi-continuous composites of ion and electron conductors (for example, 
a ceramic and a metallic material) because the transport pathways can 
be spatially separated (dual-phase transport)***; however, enabling 
extra storage of matter in composite materials is delicate, because it 
relies on contact phenomena. These phenomena have been studied 
in the context of introducing a lithium, or even a hydrogen, excess in 
Li2O:Ru or LiF:Ni composites”>”* via ‘job-sharing’-—whereby Lit or 
H’ is stored on the salt side of the contact and e~ or H~ on the metal 
side. However, these systems suffer from slow kinetics and the poorly 
defined contact chemistry. 

Compositional variations due to excess charge have been discussed”” 
on a general thermodynamic level; they are involved in various 
grain-boundary and surface phenomena in ceramics**~*? or, more 
generally, when heterogeneous doping is considered as a function of 
stoichiometry*”**. Compositional changes at heterophase contacts 
are implicitly addressed in the field of supercapacitors. Here we put 
forward the concept of considering a composite of an ion conductor 
and an electron conductor (or more generally of two mixed conduc- 
tors) as a heterogeneous mixed conductor that can take up or release 
components reversibly. Our approach is far more general than the 
supercapacitive view because (i) it allows for a thermodynamic and 
kinetic unification of phenomena such as stoichiometric variations 
in heterogeneous systems, (ii) it allows us to formulate the transition 
from a supercapacitive situation to a homogeneous bulk situation via 
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mesoscopic intermediates’, (iii) it allows us to couple the formalism of 
defect chemistry to the problem of boundary composition, and (iv) it 
highlights the possibility of fast lateral interfacial chemical diffusion. 
Here we study the RbAg,Is:graphite composite as a prototype of an 
artificial mixed conductor. RbAgals is a super-ionic conductor, with 
extremely high silver-ion conductivity and negligible electronic or 
anion conductivity”. Graphite is a purely electronic conductor with 
the ability to exhibit n- and p-type transport*>”°. The interfacial polari- 
zation of silver halides and alkali silver halides in contact with graphite 
has been studied previously*”~, but the storage behaviour has not yet 
been explored. The artificial mixed conductor we study exhibits two 
anomalies. First, unlike the pure phases that are stoichiometrically inert, 
the mixed conductor allows for not only a substantial silver excess, 
but—characteristically—also a silver deficiency; therefore, it effectively 
shows a veritable homogeneity range, similarly to a mixed conducting 
bulk phase, which enables a generalized treatment of stoichiometric 
changes in interfacially controlled materials. Second, this stoichiomet- 
ric variation propagates very rapidly along the interface. Even though 
the transport kinetics for bi-continuous phases takes advantage of bulk 
migration pathways, it is the interfacial transport path that is conceptu- 
ally most intriguing, for the following reasons: (i) it sets an upper limit 
to the relaxation time; (ii) it is characterized by an ambipolar diffusion 
process, unlike the bulk path, and so the chemical diffusion coefficients 
can be compared with the corresponding values in pure phases; and 
(iii) it determines the kinetics of the non-compacted composites that 
we study. If such composites are used as components of electrochemical 
cells, then very high rate performances are achieved, supported by the 
fact that the super-ionic conductor can naturally be used as a solid 
electrolyte and that usual passivation problems do not arise. 


Equilibrium stoichiometric variation 

To add or remove silver we use the coulometric cell Ag/RbAgu4ls/ 
RbAguls:graphite/Pt. By passing a current through the cell for a given 
length of time, silver titration (by charge Q) of the composite is achieved 
(Fig. 1). The limit of silver addition is reached when silver is deposited 
on the Pt side and the open-circuit cell voltage (E) is zero; the limit of 
any silver removal is indicated by the oxidation of the iodine ion (that 
is, liberation of I,), which starts at a voltage of approximately 470 mV, 
consistent with observations*’ and with the standard decomposition 
voltage*! of approximately 670 mV. For pure RbAgyls, the change 
between these two limiting values (0 and 470 mV) is abrupt, which 
indicates negligible stoichiometric variation of Ag, in correspondence 
with the extremely low electronic-carrier concentration in the material. 
In the case of carbon, the E(Q) curve exhibits an abrupt change from 
high, not-well-defined values to zero, which also indicates zero silver 
storage. The result for the composite is markedly different. Figure la 
shows the expected transition from the I, liberation voltage to zero 
voltage, but with substantial stoichiometric variation in between. Silver 
excess and deficiency are realized, as in usual mixed conductors (for 
example, Ag»S, Ag,Se and AgzTe)!*!”!8. In contrast to these mixed 
conductors, the stoichiometric zero point of the charge-voltage curve 
of our composite is well defined because it refers to the starting point 
after putting two very stoichiometric (on the scale of interest) phases 
into contact (Supplementary Information section I and Supplementary 
Fig. 3). 

The silver excess is realized by silver interstitials (Ag*) on the ionic 
conductor side and excess electrons (e’) on the electronic conductor 
side, whereas the deficiency is realized by silver vacancies (Vg) on one 
side and electron holes (h*) on the other side (Fig. 1b). We briefly 
discuss the theoretical background in Supplementary Information 
section I. A quantitative and comprehensive thermodynamic treatment 
of the stoichiometric variation of these job-sharing composites is 
beyond the scope of this Article. It can be shown that the effect of added 
or removed charge on the voltage can generally be split into a contri- 
bution from the diffuse double layer and a contribution from the poten- 
tial jump over the interface’®, with the first being negligible, owing to 
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Figure 1 | Silver excess and deficiency through interfacial storage. 

a, Variation of silver content in a RbAg,ls:graphite (90%:10%) composite, 
achieved by coulometric titration. The potential range is bound by 
decomposition (‘I; liberation, left) and Ag deposition (right) without 
indication of underpotential deposition. b, Schematic of interfacial 

Ag deficiency and excess. On the ionic-conductor (RbAgals) side, the 
deficiency is realized by Ag vacancies (V‘ag) and the excess by interstitials 
(Ag?) on the electronic-conductor (graphite) side, the deficiency is 
realized by electron holes (h*) and the excess by excess electrons (e’). 

c, Storage capacity as a function of volume fraction, measured by 
coulometric titration between 10 mV and 400 mV. The composite (centre 
inset) shows distinct storage capacity, unlike the pure phases (carbon, left 
inset; RbAguls, right inset). The position of the maximum varies if 
different normalization is used (for example, per mole number), but the 
general trend remains that the capacity increases with the contact area 
between the different phases. Error bars show range over three samples. 


the high bulk ionic- and electronic-charge-carrier contributions, 
leading to the rather linear E-Q characteristic. 

Stoichiometric effects increase with the contact area between the two 
materials, and are zero for the pure constituents (Fig. 1c). Dispersions 
of graphite in RbAg,l; and of RbAggl; in graphite lead to an increase 
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Figure 2 | Rapid silver dissolution in the job-sharing composite. a, The 
relaxation process of the coulometric titration shows anomalously 

rapid silver storage/removal in the RbAg,I;:graphite composite. b, 

Ag permeation experiment. At t= 10 min, the right-hand working 
electrode (WE) is suddenly brought into contact with silver. (Blue parts of 
the cells refer to high Ag potential.) For the ‘job-sharing’ composites, the 
open-circuit cell voltage (E, measuring the silver chemical potential at Ax) 
approaches zero very quickly. Unlike RbAg,ls:graphite (filled circles), in 
the case of Ag»S (open circles) E has not changed even after 2h. The inset 
refers to an experiment that was specially designed to follow the transient 
and enable evaluation of the effective diffusion coefficient. (The dashed 


in the capacity, with the maximum achieved at 40 vol% RbAgyls. The 
correlation with the interfacial area is also highlighted by our results on 
RbAg,ls composites with multi-wall carbon nanotubes (Supplementary 
Fig. 10). Note that the 5-value in Fig. 1 averages over the whole coarse- 
grained sample. The local 6-value is very high and corresponds to a 
silver excess of about 15% in the monolayer where the phases contact 
one another (on the basis of Brunauer-Emmett-Teller measurements; 
Supplementary Information section I). 


Kinetics of silver storage and removal 

Equally exciting as the finding of a compositional variation ranging 
from deficiency to excess is the kinetics. A typical transient of the 
coulombic titration is given in Fig. 2a. By considering the composite 
as an effective mixed conductor, an effective diffusion coefficient is 
derived that is extremely high and, for the non-compacted composite, 
is 4 x 10-+cm? s~! (Supplementary Information section II). 
This value agrees well with that predicted from the analysis of 
job-sharing diffusion along the interface. The minor importance 
of bulk migration (followed by double-layer charging) is under- 
standable from the partly mesoscopic microstructure and the 
wetting behaviour, as detailed in Supplementary Information section 
IL. Such rapid changes in stoichiometry at room temperature are 
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line in the inset marks the time lag. The evaluation of this time lag as well 
as the evaluation of the long-time behaviour yield D’ with a tolerable 
deviation indicated in the brackets.) c, Comparison of the chemical 
diffusion coefficient D* determined from titration data (circles) and 
theoretical calculations (solid line). The prediction by the job-sharing 
model is in excellent agreement with the data. d, Contributions of various 
capacitive terms to D® (1/C’ is the reciprocal chemical capacitance). The 
characteristic term resulting from the electrostatic field effect (red 
squares) is dominant and increases the chemical diffusion in the 
composite to unprecedentedly high values. For definition of the quantities, 
see Table 1. 


unprecedented, and the chemical diffusion coefficients for the inter- 
facial transport exceed bulk values for known systems by orders of 
magnitudes. Owing to the slow room-temperature kinetics, there 
are virtually no room-temperature measurements of chemical diffu- 
sion coefficients, except for lithium diffusion in batteries. The values 
reported there” are more than five orders of magnitude lower than 
those for our super-ionic conductor/graphite composite. Also, the 


Table 1 | Chemical resistance cells and chemical capacitance for 
chemical diffusion cells 


1/R° Ie? 
Classic bulk chemical diffusion of nzeon ate 
« n a Ce 
a=AggTe, LiFePOg, SrTiO3 and so on nt eon ion Ceon 
Job-sharing chemical diffusion oft Bon at 1 F252 
a= RbAgals, 8 = graphite o6,,,+ ton on con Rleeg 


The product of chemical resistance R® and chemical capacitance cé yields the relaxation time 
7 1/D® (proportionality constant given by L2), The expression given for 1/C? is valid only for the 
dilute approximation and in the absence of trapping. It is necessary to introduce activity 
coefficients as well-known thermodynamic corrections in the case of high charge-carrier 
concentration (Supplementary Information section IV), in particular for RbAgals. Because of the 
dominance of the third term in the expression for 1/C* for the composite, the qualitative and 
quantitative conclusions are not affected by such correction factors. ajon(eon), ionic (electronic) 
conductivity; Cioneon), ionic (electronic) concentration; s, separation distance of the atoms; F, 
Faraday’s constant; R, gas constant; T, temperature; <<o, absolute dielectric constant. 
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Table 2 | Comparison of chemical diffusion coefficients D® at 25°C 


NaCl in 
RbAgals/graphite composites RbAgals water 
Coulometric Ag 
Source titration permeation Calculation Ref. 49 Ref. 48 
D°(10->cm? s} 30-50 50+20 30-60 0.0017 1.6 


The indicated range of the experimental results for the coulometric titration and of the 
calculations refers to the entire stoichiometry range (Fig. 2c). The error bars concerning the 
experimental Ag-permeation data represent the range of values from two evaluation procedures 
(inset, Fig. 2b). 


values for high-temperature diffusors when extrapolated to room 
temperature are distinctly smaller than our value (Fig. 3b). 

The very high diffusion coefficients for our composite are due to 
the very low chemical resistance compared to the pure constituents 
(ions take the ionic pathways in RbAg,lI; and electrons the electronic 
pathways in graphite) and, more subtly, to the low (differential) 
chemical capacitance. The kinetics is faster than for a hypothetical 
bulk mixed conductor with same charge-carrier concentrations and 
mobilities. The reason for the fast kinetics is the third term in the 
equation for 1/C° in the second row of Table 1, which refers to the 
electrostatic energy that needs to be overcome if the concentration 
is changed at the heterojunction. As shown in Fig. 2c, d, this term 
dominates the other two terms in this equation by an order of 
magnitude. This dominance is even more evident if we include 
‘non-ideality’ terms’? (Supplementary Information sections III, IV). 
The 1/ceon term is as important as the third term only at the point of 
zero charge, leading to the small maximum in D* seen in Fig. 2c, d. 
The calculation of D’ using known material parameters yields 
5 x 10-4cm* s~’, in excellent agreement with the experimental value 
of the chemical diffusion coefficient (Table 2). 

This value of Dé also follows formally from a supercapacitive 
(de Levie-type) model* in which the boundaries are charged from 
the bulk side by shrinking the bulk phases to single layers. The elec- 
trochemical capacitance is then expressed in terms of double-layer 


Classic bulk 
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Figure 3 | Rapid job-sharing diffusion. a, Schematic of the different 
chemical diffusion mechanisms (bulk and job-sharing). In both cases, 
coupled motion occurs during which the slower component slows down 
the effective conductivity. The composite phases can be selected such 
that both contributions are high. Even for the same conductivities and 
carrier concentrations, the chemical diffusion is faster in the composite 
because the compositional change is less important, owing to the internal 
electrostatic energy, as indicated by the lower depths of the grey boxes. 
The grey arrows indicate supercapacitive charging contributions that 
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charging with the chemical part stemming from the configura- 
tional entropy of the charge on the capacitor plates (Supplementary 
Information section II1). In the case of a bi-continuous composite with 
major contributions from bulk migration, the problem is more compli- 
cated because of the greatly increased dimensionality and so needs to be 
addressed using finite element modelling. (The relaxation time for ideal 
bulk contributions, which cannot be interpreted in terms of a micro- 
scopic diffusion coefficient, is addressed below and shown to be of the 
order of tens of milliseconds for the grain sizes under consideration.) 

The schematic in Fig. 3a illustrates the effects of chemical resistance 
and capacitance that contribute to the anomalously high chemical 
diffusion of the job-sharing composites along with the bulk-assisted 
supercapacitive mode (grey arrows). Table 2 highlights the fact that the 
observed value of D® and that derived from the ambipolar interfacial 
transport model are unprecedentedly high, exceeding the values for 
other solid room-temperature systems, and even exceeding the diffusion 
coefficients of NaCl in liquid water, by orders of magnitude (Fig. 3b). 

Further corroboration of fast transport comes from another 
experiment. In this experiment, we track silver dissolution (that is, the 
propagation of the chemical potential of silver) using electrochemical 
detection, as sketched in Fig. 2b. As the initial condition, a jump in the 
chemical potential is created at the outer side of the composite (working 
electrode) by bringing it into contact with silver®. 

Figure 2b refers to a thermodynamically well-defined set-up (see 
Methods) and shows that the silver signal covered the distance Ax and 
thus reached the sensing point in less than a minute, leading to zero 
potential difference. The inset refers to a refined experiment that allows 
us to resolve the relaxation time. According to a detailed evaluation 
(Supplementary Information section IJ), this relaxation time agrees 
well with the diffusion coefficient given above. Such a quick response 
is not observed for mixed conductors such as AgS (Fig. 2b) and Ag»Se 
(Supplementary Fig. II(B)), as the comparison shows. 

To demonstrate the usefulness of super-ionic conductor/graphite 
composites in electrochemical devices*”-*"*, we briefly discuss its 
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benefit from bulk migration and are dominant for bi-continuous bulk 
percolation. b, Comparison of chemical diffusion coefficients for 
different materials at room temperature (dashed lines are extrapolated 
from ref. 14 (Ag>Te) or ref. 50 (AgCl)). Values for RbAg,l; are from 
ref. 49. The value for our composite (red circle) exceeds the reported 
values for other solids (including LiCoO) (ref. 42), green triangle) 

and even for NaC] in liquid water** (blue square) by at least one order 
of magnitude. (Bold elements in brackets indicate the transported 
elements.) 
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Figure 4 | Ultrafast performance of electrochemical devices using 
RbAg,I;:graphite. a, The capacity of an all-solid-state battery at different 
C rates. As per common practice, a C rate of 1 denotes the rate at which a 
full charge or discharge takes 1h; for example, it takes 3 s at 1,200C. Inset, 
voltage-capacity curves. b, All-solid-state supercapacitor. Here, we realize 
silver excess (interstitials on the RbAgzI; side, electrons on the graphite 
side) on one side of the cell and silver deficiency (vacancies on the RbAguls 


performance in two applications. For these experiments, we prepared 
samples with a maximum of bulk percolation by applying substantial 
pressure to the samples during the preparation. In the first experiment 
(Fig. 4a), the micrometre-sized composite is used as an electrode in a 
battery cell and can be reversibly charged at extremely high rates. In 
this case, we are able to use the entire effective stoichiometric width of 
the composite. In the second experiment (Fig. 4b), the composite is 
used within a supercapacitor cell, which has a relaxation time 
characterized from impedance spectroscopy of 40 ms. We are not aware 
of a quicker relaxation process for an all-solid-state supercapacitor 
device. Even devices based on liquid systems exhibiting such values are 
considered to be extremely fast*®. Note that here we refer to silver excess 
(Ag? /e’) on one side and to silver deficiency (Vag /h®*) on the other. The 
observed relaxation time is close to the ideal supercapacitive response 
time (Supplementary Information section III), indicating essentially 
bulk percolation; however, owing to the non-ideal microstructure, 
favourable bridging effects, according to the described interfacial 
diffusion over short distances, can be assumed to occur. It can be 
conjectured that, in many supercapacitors (particularly those with a 
partial covalency or partial Faradaic process), the lack of such a 
possibility could be a reason for decreased practical rate performance. 
Details of such studies are beyond the scope of this Article. 

If both bulk and interfacial transport are macroscopically relevant, 
then we can conclude that it is the upper limit of the measured 
relaxation time that is determined by the interfacial diffusion and 
thus by the component redistribution kinetics along the percolating 
interfaces of the composites. In the context of supercapacitive action, 
beyond the quick build-up of boundary polarization via bulk transport 
(charging of capacitor plates from outside)*”-*?, the rapid propagation 
of this electrochemical polarization state along the boundaries is 
a key issue (transport along the capacitor plates). These boundary 
effects will become more dominant for composites of weak conduc- 
tors where migration processes in the bulk are less important and, in 
particular, in systems in which the species of interest is not conduc- 
tive at all in the bulk (such as dissociative hydrogen incorporation in 
Li,O-Ru composites”®). A preference of transport along the interface 
as compared to lateral charging is also realized in strongly anisotropic 
materials (such as graphite) and in thin film systems of nanometre-scale 
dimensions as partly realized in our non-compacted composites 
(Supplementary Information section II). Such effects will be highly 
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side, holes graphite side) on the other. The frequency dependence of the 
real (C’; left axis) and imaginary (C”; right axis) parts of the capacitance 
shows an extremely short relaxation time of T) = 40 ms, indicating ultrafast 
charge/discharge rates. The short relaxation time also suggests that local 
supercapacitive action is assisted by interfacial neutral mass transport 
(redistribution of the polarized state). 


relevant for targeted future artificial conductors with extreme storage 
capacities (monolayer heterostructures of fast ionic and electronic 
conductors), for chemical transport along grain boundaries in ceramics 
and for surface processes in catalysis (such as spill-over). 


Conclusion 

Here we show that powerful mixed conductors can be constructed by 
forming appropriate composites of super-ionic conductors and pure 
electronic conductors, displaying a proper homogeneity range with 
excess and deficiency. This homogeneity range enables the design 
of new mixed conductors, for example, those without redox-active 
elements, and generalization of the stoichiometry concept for hetero- 
geneous systems. The local storage capacity can be very high, promising 
pronounced storage capacities in adequately nanostructured devices. 
Moreover, a unified treatment of mass-transport kinetics in heteroge- 
neous media is enabled by our approach. Storage in such composites 
can be extremely fast, owing to bulk migration and the astonishingly 
fast chemical diffusion along the boundaries. This interfacial diffusion 
path will be decisive for future artificial mixed conductors based on 
mesoscopic phase constituents. 

We believe that this study will stimulate systematic research on 
artificial, fast, mixed conductors with substantial potential for super- 
capacitor systems, electrodes, permeation membranes or catalysts, by 
constructing relevant nanometre-scale heterostructures in analogy 
to artificial ion-conductor systems*’. Besides the practical aspects, 
we emphasize the conceptual broadening achieved by considering 
thermodynamics and kinetics of heterogeneous storage. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Synthesis of RbAg,I; and RbAg,I;/carbon composites. RbAg4l; was synthesized 
by melting a near-stoichiometric mixture (83:17 mol%) of Agl (Sigma-Aldrich, 
99.999%) and RbI (Sigma-Aldrich, 99.9%) at 450°C for 2h, followed by rapid 
cooling. RbAg,I;/graphite composites were first mixed with AgI/RbI and the 
appropriate amount of graphite (Alfa Aesar, 99.9995%) in an agate mortar. (The 
water content of graphite is less than 100 p.p.m., according to thermogravimetric 
analysis.) To improve the RbAg,I;/graphite contact, the mixture was melted at 
450°C for 10h. For composites of RbAg,I; and multi-wall carbon nanotubes 
(MWCNTs), the procedure was the same except graphite was replaced with 
MWCNTSs (Sigma-Aldrich, 98%; the dimensions of the nanotubes are 10 nm (outer 
diameter), 4.5 nm (inner diameter), and 6 jum (length). All syntheses were carried 
out in the dark under Ar atmosphere. 

Characterization method. Structure and crystallinity were characterized by 
X-ray diffraction (XRD) using a Philips PW3710 (40kV/30 mA) with Cu-Ka 
radiation. Scanning electron microscopy (SEM) analysis was performed using a 
Zeiss Gemini DSM 982 scanning electron microscope. The nitrogen adsorption 
and desorption isotherms were measured using an Autosorb-1 system (Quanta 
Chrome). Transmission electron microscopy (TEM) was performed with a Phillips 
CM30 ST (300kV, LaBg cathode). Thermogravimetry was carried out with Netzsch 
STA449C Jupiter TG. 

Electrochemical experiments. The prepared RbAg,I; and RbAg,I;/graphite 
composites served as the solid electrolyte and cathode. A mixture of silver 
powder (Sigma-Aldrich, 99.9%) and RbAg,l; in the weight ratio of 4:1 was used 
as the anode. Powder compacts of each cell component were uniaxially pressed 
into 5-mm-diameter pellets at 35 MPa for 1 min. Graphite pellets are difficult to 
prepare by cold pressing; hence, graphite rods (Sigma-Aldrich, 99.999%) were 
used instead. Platinum foil and Ag foil were used as cathodic and anodic current 
collectors, respectively. After assembling all the components, the batteries were 
placed in gas-tight quartz set-ups. To reduce the contact resistance, the cell was 
slightly spring-loaded. Under Ar atmosphere, the electrochemical experiments 
were performed in the dark at 25°C on an Arbin MSTAT system. 

The coulometric titration experiment (that is, galvanostatic intermittent 
titration technique, GITT) consists of a series of current pulses, followed by an 
equilibration period. Constant currents for specific intervals of time were then 
applied. Depending on the volume ratio, the time intervals ranged from 30s to 
3 min for RbAg,I;/graphite composites (10~° A/g) and from 3 min to 5 min for 
RbAgyls/MWCNTs composites (10-4 A/g). After switching off the current, the 
open-circuit potential was measured until the system has reached equilibrium. 
The equilibration time typically ranged from 15 min to 60 min. 
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For the extremely slow galvanostatic charge/discharge on pure RbAguls, a very 
small constant current of 4 x 10-1? A/g was supplied by a Keithley 220 current 
source. The potential was measured by a Keithley 6514 electrometer. 

Besides battery cells, all-solid-state supercapacitors were fabricated by 

sandwiching a RbAgul; pellet in between two RbAg,I;/graphite composite pellets. 
The preparation methods and the pellet sizes were the same as mentioned earlier. 
Platinum foils were used as current collectors. For the rate evaluation, the electro- 
chemical cells (batteries and supercapacitors) were tested in Swagelok-type cells. 
The assembling was done in an Ar-filled glovebox (O2 content of <0.3 p.p.m., 
H,0O content of <0.1 p.p.m.). The electrochemical impedance spectroscopy was 
conducted by a Voltalab PGZ402 impedance analyser with 5-mV signal amplitude. 
The frequencies ranged from 10° Hz to 0.01 Hz. The relaxation time of the super- 
capacitor was obtained by analysing the frequency dependence of the real and 
imaginary parts of the complex capacitance*!. 
Ag permeation experiment. Ag,S (Sigma-Aldrich, 99.9%), AgoSe (Sigma-Aldrich) 
and RbAg,I;/graphite composites were used as working electrodes. The anode, 
electrolyte, anodic current collector and the pressing conditions were the same as 
for the electrochemical experiments. The sensor probe was similar to the electronic 
probe used in ref. 52. The length of the working electrode was about 2-3 mm. 
Pt wire was wrapped around the working electrode bar. The position of the Pt wire 
was in the centre of the bar. 

The electromotive force (E) was measured under Ar atmosphere between the Ag 
anode and the working electrode. E was very stable, at least for 10h. Before starting 
the permeation experiment, the quartz set-up was opened and the E measurement 
switched off. An Ag pellet was contacted and attached to the outer side of the 
working electrode. The measurement set-up was then closed and flushed with 
Ar. The reassembling process took about 1 min. Afterwards, E was recorded for 
at least 6h. 

Because the time lag of the RbAg,I;/graphite composite is shorter than the time 
required for reassembling the set-up, a special measurement was performed to 
allow us to track the transient. For this purpose, silver was attached in situ without 
interrupting the circuit. Because this was done under air, the absolute emf values 
are less reliable. 


51. Taberna, P. L. Simon, P. & Fauvarque, J. F. Electrochemical characteristics 
and impedance spectroscopy studies of carbon-carbon supercapacitors. 
J. Electrochem. Soc. 150, A292-A300 (2003). 

52. Sitte, W. Electrochemical cell for composition dependent measurements of 
chemical diffusion coefficients and ionic conductivities on mixed conductors 
and application to silver telluride at 160 °C. Solid State lon. 59, 117-124 
(1993). 
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Tempo and mode of genome evolution in 
a 50,000-generation experiment 
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Adaptation by natural selection depends on the rates, effects and interactions of many mutations, making it difficult 
to determine what proportion of mutations in an evolving lineage are beneficial. Here we analysed 264 complete 
genomes from 12 Escherichia coli populations to characterize their dynamics over 50,000 generations. The populations 
that retained the ancestral mutation rate support a model in which most fixed mutations are beneficial, the fraction of 
beneficial mutations declines as fitness rises, and neutral mutations accumulate at a constant rate. We also compared 
these populations to mutation-accumulation lines evolved under a bottlenecking regime that minimizes selection. 
Nonsynonymous mutations, intergenic mutations, insertions and deletions are overrepresented in the long-term 
populations, further supporting the inference that most mutations that reached high frequency were favoured by 
selection. These results illuminate the shifting balance of forces that govern genome evolution in populations adapting 


to a new environment. 


Comparative genomic studies have identified the molecular basis of 
adaptations including lactase permanence in humans’, domestication 
of plants” and animals’, and pathogenicity in bacteria*. Nevertheless, 
it is difficult to determine more generally what fraction of new muta- 
tions in an evolving lineage are beneficial. Answering this question 
is important for modelling sequence changes used in phylogenetic 
methods? and would inform debate about adaptive and non-adaptive 
modes of genome evolution®”. 

The combination of experimental evolution and genome sequencing 
provides a way forward that has been used with viruses, bacteria, yeast 
and flies*/?, In a study of bacteria, the diversity of mutations involved 
in adaptation to high-temperature stress was studied by sequencing 
>100 lineages after a 2,000-generation experiment”. In another study, 
sequencing a series of clones from one population over 40,000 genera- 
tions showed the trajectory of genome evolution’. However, a short-term 
experiment reveals only the early steps of adaptation, and it is difficult 
to distinguish adaptive ‘driver and non-adaptive ‘passenger’ mutations 
when only one population is examined. Beneficial mutations can also 
be identified by lineage tracking" and genetic reconstruction’® experi- 
ments, but these approaches become impractical after an initial selective 
sweep or when mutations become too numerous over time, respectively. 

To overcome these limitations, we analysed complete genomes of 
264 clones from 12 populations across 50,000 generations of the long- 
term evolution experiment (LTEE) with E. coli!*'’. These populations 
have evolved in a defined medium with scarce resources since 1988. 
Mean fitness measured in competition with their ancestor increased by 
~70% in that time!’. The LTEE is a model system for studying many 
fundamental evolutionary questions”!>-”?. 


Genome-wide mutations and hypermutability 
We sequenced the genomes of two clones from each population after 
500, 1,000, 1,500, 2,000, 5,000, 10,000, 15,000, 20,000, 30,000, 40,000 


and 50,000 generations using the Illumina platform (Supplementary 
Data 1). We called mutations, including structural variants, using 
the breseq pipeline**?® . In total, we found 14,572 point mutations; 
500 insertions of insertion sequence (IS) elements; 726 deletions 
and 1,132 insertions each < 50 base pairs (bp) (small indels); and 
267 deletions and 45 duplications each >50 bp (large indels). After 
50,000 generations, average genome length declined by 63 kb (~1.4%) 
relative to the ancestor (Extended Data Fig. 1). Mutations were not 
distributed uniformly across the populations. Instead, six popula- 
tions (Ara—1, Ara—2, Ara—3, Ara—4, Ara+3 and Ara+6) had 96.5% 
of the point mutations, having evolved hypermutable phenotypes 
caused by mutations that affect DNA repair or removal of oxidized 
nucleotides'®”°, Figure 1a shows the trajectories for the total mutations 
in all 12 populations; Fig. 1b is rescaled for better resolution of those 
that did not become point-mutation mutators. Hypermutability tended 
to decline over time as the load of deleterious mutations favoured 
antimutator alleles”°. All four populations that were hypermutable at 
10,000 generations accumulated synonymous substitutions (a proxy 
for the underlying point-mutation rate) between generations 40,000 
and 50,000 at much lower rates than from 10,000 to 20,000 generations 
(Extended Data Fig. 2). 

Increased numbers of IS elements can also cause hypermutability’®, 
with higher rates not only of transpositions but also deletions and dupli- 
cations through homologous recombination. In population Ara+1, 
31.8% of all mutations up to 50,000 generations were IS150 insertions, 
compared with 12.3% for the other populations that never evolved ele- 
vated point-mutation rates. This mode of hypermutability arose early in 
Ara+1; 1S150 insertions are overrepresented in each Ara+1 clone from 
5,000 generations onwards when compared individually to all other 
non-mutator clones from the same generation (Fisher's exact test with 
Bonferroni correction, P < 0.05). Two clones from other populations 
were also [S150 hypermutators by this test: 38.7% of the mutations in 
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Figure 1 | Total number of mutations over time in the 12 LTEE 
populations. a, Total mutations in each population. b, Total mutations 
rescaled to reveal the trajectories for the six populations that did not 
become hypermutable for point mutations, and for the other six before 
they evolved hypermutability. Each symbol shows a sequenced genome; 
some points are hidden behind others. Each line passes through the 
average of the genomes from the same population and generation. 


a 30,000-generation clone from Ara—5 and 31.7% of the mutations 
in a 40,000-generation clone from Ara—3 were IS150 insertions. The 
aberrant Ara—5 clone shares only one mutation with other sequenced 
Ara—5 clones, indicating early divergence; it does not share point muta- 
tions with any other population, excluding cross-contamination. The 
emergence of these various mutator types shows that evolution can 
alter the production of genetic diversity*®”’, which in turn changes the 
tempo and mode of genome evolution. 


Population phylogenies 

Figure 2a shows phylogenetic trees constructed using point mutations 
for each population; Fig. 2b shows the trees with branches rescaled 
after mutators evolved. Some populations—including Ara—2, which 
became hypermutable early, and Ara—6, which never did—harbour 
lineages that coexisted for tens of thousands of generations. Some 
others—including Ara—4, which became hypermutable, and Ara+2, 
which did not—are more linear in structure, without deep branches 
among the sequenced clones. Deep branches were probably supported 
by the diversity-promoting effects of negative-frequency-dependent 
interactions, as shown in the Ara—2 population”*”’. Sequencing 
whole-population samples would provide more detailed information 
on within-population diversity'>”. 


Dynamics of genome evolution 

The accumulation of point mutations increased greatly in hypermu- 
table populations®!*”°, potentially overwhelming the genomic signa- 
ture of adaptation. Although mutator lineages may experience higher 
rates of fitness improvement'””’, the effect is usually small owing to 
clonal interference between competing beneficial mutations”*”? and 
the increased load of deleterious mutations~”*”. Therefore, beneficial 
mutations become harder to detect in a sea of unselected mutations in 
mutator lineages. To understand better the dynamic coupling between 
adaptation and genome evolution, we first analysed the populations 
that retained the ancestral mutation rate up to 50,000 generations and 
the others before they became point-mutation or IS150 mutators. 

It was previously found’” that the mean-fitness trajectory of the LTEE 
is well described by a power-law relation, in which log fitness increases 
linearly with log time. Moreover, the power law accurately predicts 
fitness to 50,000 generations using data from only the first 5,000 gen- 
erations. It was shown that a population-dynamical model that incor- 
porates two phenomena known to be important in the LTEE—clonal 
interference***! and diminishing-returns epistasis'>”’—generates a 
power-law relation. This model in turn predicts that the number of 
beneficial mutations should increase with the square root of time!’. 
However, not all mutations that accumulate are beneficial; neutral and 
nearly neutral mutations can spread by recurring mutation, random 
drift, and hitchhiking’ ™*. Selective sweeps will purge some neutral 
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Figure 2 | Phylogenetic trees for LTEE populations. a, Phylogenies for 


22 genomes from each population, based on point mutations. b, The 
same trees, except branches are rescaled as follows: branches for lineages 
with mismatch-repair defects are orange and shortened by a factor of 

25; branches for mutT mutators are red and shortened by a factor of 50. 
Strain REL606 (on the left) is the ancestor. No early mutations are shared 
between any populations, confirming their independent evolution. Most 
populations have multiple basal lineages that reflect early diversification 
and extinction; some have deeply divergent lineages with sustained 
persistence, most notably Ara—2. 


mutations but cause others to increase; overall, the expected number 
of neutral mutations should increase linearly with time*. 

To test these predictions, we fit three models to the trajectory for the 
total number of mutations in the non-mutator and premutator lineages: 


m = at 
m=b vt 
m=at+bvt 


where m is the number of mutations, t is time (generations), and a 
and b govern the genome-wide rates of accumulation of neutral and 
beneficial mutations, respectively (Fig. 3). (Extended Data Fig. 3 shows 
the models fit to each population separately.) Using the Akaike infor- 
mation criterion (AIC), the two-parameter model fits the data much 
better than those with only the linear (AAIC = —77.7) or square-root 
(AAIC = —99.7) terms. Because the one-parameter models are nested 
within the two-parameter model, we can also assess the significance of 
adding the second parameter; P values are 7.5 x 107° and 5.2 x 107” 
relative to the linear and square-root models, respectively. The trajec- 
tory for genome evolution thus shows signatures of both adaptive and 
non-adaptive changes. However, the model that predicts the square- 
root trajectory of beneficial substitutions makes various assumptions 
(for example, about the form of epistasis), and both the predicted and 
observed trajectories have statistical uncertainties. (Extended Data Fig. 4 
shows the uncertainty in estimating a and b from the observed trajec- 
tory.) Therefore, we examined additional evidence to shed light on the 
proportion and identity of beneficial mutations. 


Evidence for beneficial mutations 

We sought to understand what proportion of the genomic changes in 
the non-mutator populations was adaptive, and how that proportion 
changed over time. One line of evidence derives from the expecta- 
tion that synonymous substitutions—point mutations in protein-cod- 
ing genes that do not affect the amino-acid sequence—are neutral 
and should therefore accumulate at a rate equal to the underlying 
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Figure 3 | Alternative models fit to the trajectory of genome evolution. 
Each symbol shows total mutations in a clone from five populations 

that never became mutators and seven before point mutation or IS150 
hypermutability evolved. Colours are the same as in Fig. 1; open triangles 
indicate grand means. Dashed grey line shows the best fit to the linear 
model, m=at. Solid grey curve shows the fit to the square-root model, 
m=b vt. Black curve is fit to the composite model, m= at + b Vt, where 
a=0.000944 and b = 0.134856. See text for statistical analysis. 


mutation rate’. This expectation is not strictly true owing to selec- 
tion on codon usage, RNA folding, and other effects, but it is gener- 
ally thought that such selection is extremely weak, affects only a small 
fraction of sites at risk for synonymous mutations, or both**3”. We 
calculate whether nonsynonymous and intergenic point mutations are 
found in excess relative to synonymous mutations, given the number 
of sites at risk for each class. Figure 4a shows the number of synony- 
mous mutations in non-mutator and premutator populations, scaled 
so the mean at 50,000 generations is unity. As expected, synonymous 
mutations accumulated at an approximately constant rate (Extended 
Data Fig. 5). Figure 4b shows the number of nonsynonymous mutations 
relative to the neutral expectation based on synonymous mutations. 
Nonsynonymous mutations accumulated ~17.1 times faster than 
synonymous ones during the first 500 generations and ~3.4 times 
faster over 50,000 generations. Nonsynonymous mutations continued 
to accumulate at over twice the rate of synonymous mutations in the 
later generations (Extended Data Fig. 6), implying that most nonsyn- 
onymous mutations that reached high frequency were beneficial even 
after so long in a constant environment. The same approach applied to 
intergenic point mutations (Fig. 4c) also reveals a large excess relative 
to synonymous mutations, although the number of events is smaller 
and the uncertainty greater. This result implicates adaptive changes in 
noncoding regions that presumably affect the binding sites for regu- 
latory proteins**, 

Synonymous mutations provide an internal benchmark for non- 
synonymous and intergenic point mutations. However, synonymous 
mutations are not directly informative for understanding how selec- 
tion affects the accumulation of indels that comprise almost half the 
mutations in non-mutator clones at 50,000 generations (Extended Data 
Fig. 7). To estimate the proportion of beneficial changes for other 
types of mutation, we compare the LTEE and a mutation accumulation 
experiment (MAE) in which 15 lines were propagated via repeated 
single-cell bottlenecks*!. Such bottlenecks eliminate the variation 
needed for natural selection, so that all types of mutations accumulate 
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Figure 4 | Trajectories for synonymous, nonsynonymous and intergenic 
point mutations. a, Synonymous mutations, scaled so that the mean 

of five non-mutator populations (excluding point mutation and IS150 
hypermutators) is unity at 50,000 generations. b, Nonsynonymous 
mutations, scaled using the same rate as synonymous mutations after 
adjusting for sites at risk for both classes. c, Intergenic point mutations, 
scaled using the same rate as synonymous mutations after adjusting for 
sites at risk. Each symbol shows the mean for sequenced genomes from 

a non-mutator or premutator lineage. Colours are as in Fig. 1. Note the 
discontinuous scale; populations with zero mutations are plotted below. 
Black lines connect grand means; shading shows standard errors calculated 
from replicate populations. 


at the rates at which they happen, regardless of fitness effects, except for 
lethal or highly deleterious mutations that preclude cells from making 
colonies used to propagate lines”. MAE lines thus provide an external 
baseline for distinguishing beneficial and non-beneficial mutations. In 
fact, because more unselected mutations are deleterious than benefi- 
cial, MAE lines are expected to lose fitness over time, which they did 
(Extended Data Fig. 8). 

To quantify the relative rates for all types of mutations in the absence 
of selection, we sequenced clones from the MAE lines after 550 daily 
bottlenecks (Supplementary Data 1). Consistent with the random 
accumulation of mutations, the number of nonsynonymous (including 
nonsense) mutations was similar to the expectation based on synon- 
ymous mutations (117 observed, 105.02 expected); the resulting ratio 
of 1.11 is well within the 95% confidence interval (0.70-1.50) obtained 
by a randomization test. Also, there was no among-line variation in 
total mutations (7 =5.46, degrees of freedom (df) = 14, P=0.978). 
We can therefore reasonably use the MAE lines to estimate relative 
rates of different types of mutations, with synonymous ones providing 
a benchmark largely free of selection in both experiments. For example, 
LTEE population Ara—1 had 21 nonsynonymous mutations at 
20,000 generations and the expected number of synonymous muta- 
tions based on the average non-mutator population was 1.08 (Extended 
Data Fig. 5); the 15 MAE lines in total had 117 nonsynonymous and 
39 synonymous mutations; thus, the ratio of observed mutations 
to the neutral expectation is (21/1.08)/(117/39) =6.5. These ratios 
show that all major classes of mutations—including various indels—are 
substantially overrepresented in the LTEE relative to the MAE 
(Extended Data Fig. 9), implying that many mutations in each class 
were adaptive during the LTEE. 


Parallel evolution at many gene loci 

Parallel evolution occurs when similar changes arise independently 
in multiple lineages, and it is often used to discover putative targets 
of selection**!°-!37!, Genetic parallelism can be studied at the level 
of DNA sequence, affected genes, or integrated functions. Parallelism 
at the nucleotide level tends to be rare because different mutations 
in a gene often produce similar benefits*!*"!*!, although there are 
exceptions’. Parallelism at a functional level requires detailed under- 
standing that may be unavailable, and it is difficult to interpret when 
there are many mutations. We therefore examined parallelism at the 
gene level. 
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Table 1 | Protein-coding genes with the highest G scores 


Gene Length Observed Expected Annotation 

pykF 1,413 19 0.16 181 Pyruvate kinase 

iclR 825 13 0.10 128 Transcriptional repressor, glyoxylate bypass 

spoT 2,109 14 0.25 113 Stringent response 

nadR 1,233 12 0.14 106 Bifunctional transcriptional repressor and NMN adenylyltransferase 
hsIU 1332 11 0.15 94 Molecular chaperone and ATPase component of protease 

yijC (also known as fabR) 705 7 0.08 62 Transcriptional repressor, fatty acid and phosphatidic acid pathway 
topA 2,598 8 0.30 52 DNA topoisomerase | subunit 

malT 2,706 8 0.31 52 Transcriptional activator, maltotriose-ATP-binding 

mrdA 1,902 i 0.22 48 Transpeptidase in peptidoglycan synthesis 

mreB 1,044 6 0.12 47 Longitudinal peptidoglycan synthesis 

infB 2,673 7 0.31 44 Translation initiation factor IF-2 

arcA 717 5 0.08 41 Response regulator in two-component system, anoxic redox control 
argR 471 4 0.05 34 Repressor of arginine regulon 

rplF 534 4 0.06 33 50S ribosomal subunit protein 

mreC 1,104 4 O13 28 Longitudinal peptidoglycan synthesis 


Genes are ranked by G scores computed using observed independent nonsynonymous mutations relative to expected number given gene length (bp). Data are from populations with the ancestral 


point-mutation rate throughout and other populations before they evolved hypermutability. 


We focused on lineages that retained the ancestral point-mutation 
rate (including clones from populations that later became hypermu- 
table) because, as shown earlier, most mutations are drivers in those 
cases; we expect hypermutability to make the analysis less informative 
because many more mutations are passengers. We first calculated the 
expected number of nonsynonymous mutations for each single-copy 
protein-coding gene based on its length as a fraction of all such genes 
and the total number of nonsynonymous mutations in the relevant 
lineages (Supplementary Data 2). We computed G scores for good- 
ness of fit between observed and expected values; the total score is 
2,593.7. We compared that total with simulated data sets in which 
positions of mutations in the coding genome were randomized, and the 
observed total significantly exceeded the simulations (mean simulated 
G=1,933.7, Z=25.5, P< 107'"’). Fifty-seven genes had two or more 
mutations; these genes had 50.1% of the nonsynonymous mutations 


Table 2 | Genes with the most mutations of other types 


but constituted only 2.1% of the coding genome. (Only one gene 
had multiple synonymous changes.) Table 1 shows the 15 genes that 
contribute the most to the total G score. Several encode proteins with 
core metabolic or regulatory functions, including three involved in 
peptidoglycan synthesis. 

We ran the same analysis for lineages that evolved hypermutability 
(Supplementary Data 3), and the randomization test indicates signif- 
icant parallelism (G statistic = 5,098.4, mean simulated G=4,581.1, 
Z=5.745, P < 107). As expected, however, the signal-to-noise ratio 
reflected in the significance level is much weaker than for the non- 
mutator lineages. Most genes with the highest scores in mutator 
lineages differ from those in non-mutators, in part because those genes 
often had beneficial mutations before hypermutability evolved. 

Table 2 lists the 16 genes with the most deletions, duplications, 
insertions and intergenic point mutations in non-mutator lineages 


Genes Mutations Number IS MAE ~ Annotation 

rbsD Mostly large deletions 41 Yes (e) D-Ribose utilization; most deletions affect entire rbs operon 

nupC Various intergenic 5 Yes Yes Nucleoside transporter 

jap Mostly large indels 9 Yes fe) Alkaline-phosphatase isozyme conversion; most indels affect tens of adjacent 
genes including rpoS, which encodes stationary-phase o factor 

mokB Various indels 7 Yes Yes Enables hokB toxin expression 

yhgl/gntT ntergenic point mutations 6 No ° Gluconate transport 

mokC Various indels 5 Yes Yes Enables hokC toxin expression 

ybcU (also known Large indels 4 Yes ° Indels affect this and adjacent remnants of DLP12 prophage 

as borD) 

ECB_02013 Various indels 4 No Yes Indels affect this and adjacent remnants of P2-like prophage 

ECB_02816 (also Various indels 4 Yes (e) Polysialic-acid transport protein precursor 

known as kpsD) 

acs/nrfA Various intergenic 4 No fe) Acetyl-CoA synthase; nitrite reductase 

hokE Large indels 2 Yes (e) Toxin in plasmid-derived toxin—antitoxin system; most indels affect several 
adjacent genes involved in iron acquisition 

ybeB/phpB Various intergenic Yes ° Unknown functions, but adjacent to genes involved in cell-wall synthesis 

ydiJ/ydik Various intergenic No ° Predicted FAD-linked oxidoreductase; putative inner membrane protein 

IdrC Various indels 0 Yes Yes Small toxic polypeptide 

menC IS insertions 0 Yes Yes Menaquinone biosynthesis 

fimA Mostly IS insertions 0 Yes ° Component of fimbrial complex 


Genes are ranked by total mutations excluding nonsynonymous and synonymous point mutations. When two genes are separated by a solidus, the affected sequence includes the intergenic region 
between them. IS column indicates whether the majority of mutations involve IS elements. MAE column indicates whether the same or nearly identical mutations occurred in one or more MAE lines. 
Data are from populations with the ancestral point-mutation rate throughout and others before they evolved hypermutability. 
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(Supplementary Data 2). For mutations that impact multiple genes, 
we show the most frequently affected gene (or adjacent pair when 
most events are intergenic). In 12 cases, the majority of the mutations 
were mediated by IS elements; these include insertions as well as 
deletions and duplications that appear to involve homologous recom- 
bination. In six cases (five with IS insertions), the same or nearly 
identical mutations occurred in one or more MAE lines, suggesting 
mutational hotspots. These changes may indicate high-frequency 
events, but recall that IS insertions and large indels are enriched in 
the LTEE relative to the MAE (Extended Data Fig. 9), implying that 
many are also beneficial. Indeed, the IS-mediated rbsD deletions 
occur at a high rate and are beneficial in the LTEE environment®”, 
and some IS-mediated mutations appear to be beneficial in other 
studies as well#4, 

The parallelisms involving nonsynonymous substitutions and other 
mutations in the LTEE, coupled with their high rates of accumulation 
relative to the MAE, indicate that many observed mutations were 
drivers of adaptation. For indels, however, the specific target genes are 
difficult to identify owing to the multiplicity of genes affected and the 
potentially confounding effect of mutational hotspots. 


Discussion 

Adaptation by natural selection sits at the heart of phenotypic evolu- 
tion. However, the random processes of spontaneous mutation and 
genetic drift often overwhelm and obscure genomic signatures of 
adaptation. We overcame this difficulty by analysing genomes from 
12 bacterial populations that evolved for 50,000 generations under 
identical culture conditions. Even so, six populations evolved hyper- 
mutable phenotypes that increased point-mutation rates ~100-fold, 
and another evolved hypermutability caused by a transposable element. 
By focusing on populations that retained the ancestral mutation 
rate, we identified several key features of the tempo and mode of 
their genome evolution. First, a population-genetic model with two 
terms—one for beneficial drivers, the other for neutral hitchhikers— 
fits the dynamics much better than models without both terms. 
Second, the great majority of mutations observed during the early 
generations were beneficial drivers. Third, the proportion of observed 
mutations that were beneficial declined over time but remained sub- 
stantial even after 50,000 generations. The second and third findings 
follow from the population-genetic model. Both are also strongly sup- 
ported by the excess of nonsynonymous to synonymous substitutions 
in the LTEE and by the excess of several classes of mutations, including 
indels, in comparison to mutation-accumulation lines. Fourth, there 
was strong gene-level parallel evolution across the replicate LTEE 
populations. 

Our analyses also show a contrast between the contributions 
of beneficial mutations to molecular evolution and to the fitness 
trajectory in a stable environment. In particular, beneficial mutations 
continued to constitute a large fraction of genetic changes throughout 
the 50,000 generations of the LTEE, whereas the resulting fitness 
gains were only a few per cent in the last 10,000 generations!”. 
Beneficial mutations with very small selection coefficients are none- 
theless visible to natural selection!’. Hence, adaptation can remain 
a major driver of molecular evolution long after an environmental 
shift. Our experimental results thus support a selectionist view of 
molecular evolution, complementing indirect evidence based on 
comparative genomics in bacteria, Drosophila and humans*>~*”. 
Of course, the LTEE may differ from many natural populations in 
important respects including its low mutation rate, the absence of sex 
or horizontal gene transfer, and a stable environment. As we showed, 
high mutation rates tend to obscure the role of selection in molecu- 
lar evolution. The effects of horizontal gene transfer’ and variable 
environments*””? on the dynamic coupling of genomic and adaptive 
evolution should also be examined further. Long-term experiments 
with microorganisms provide opportunities for rigorous analyses of 
these issues. 
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METHODS 


Long-term evolution experiment. The LTEE has 12 populations founded from 
two almost identical strains of Escherichia coli. Six populations, designated 
Ara—1 to Ara—6, started from REL606, a descendant of the B strain of Luria and 
Delbriick*!-™*. The other six, Ara+1 to Ara-6, derive from REL607, which differs 
from REL606 by point mutations in araA and recD. The mutation in araA was 
selected before starting the LTEE; it confers the ability to grow on L-arabinose, 
which provides a marker in competition assays used to measure fitness!*!, The 
recD mutation arose inadvertently before starting the LTEE. The LTEE began in 
1988, and the populations have been propagated (with occasional interruptions) 
at 37 °C by daily 100-fold dilutions in 10 ml Davis minimal medium with 25 1g/ml 
glucose (http://lenski.mmg.msu.edu/ecoli/dm25liquid.html). The regrowth allows 
~6.67 generations per day; the population size fluctuates between ~3 x 10° and 
~3 x 108 cells except in population Ara—3, which has had a population size 
several times larger since ~33,000 generations, when cells gained the ability to 
consume the citrate that is also present in the medium!***. Whole-population 
samples are taken every 75th transfer (500 generations) and stored with glycerol 
as a cryoprotectant at —80°C, where they are available for later analysis. Here 
we analysed the genomes of two clones sampled from each population at 500, 
1,000, 1,500, 2,000, 5,000, 10,000, 15,000, 20,000, 30,000, 40,000 and 50,000 gen- 
erations (Supplementary Data 1). We deliberately included clones from the deeply 
diverged lineages in population Ara—2 from 20,000 generations onwards and both 
the majority Cit lineage and the minority Cit” lineage in population Ara—3 at 
generation 40,000. This sampling scheme does not affect inferences about the rates 
and patterns of genome evolution because both populations were hypermutable 
at these time points and thus excluded from the main analyses. These clones were 
included to illustrate diversity within populations, although we also found previ- 
ously unknown cases of divergent lineages. No statistical methods were used to 
predetermine sample size. The experiments were not randomized. The investiga- 
tors were not blinded during experiments and outcome assessment. 
Mutation-accumulation experiment. The 15 MAE lines analysed here started 
from strain REL1207, which is an Ara* mutant of a clone sampled from LTEE 
population Ara—1 at 2,000 generations. REL1207 differs from REL606 by a total 
of eight mutations, including one in araA that confers the Ara* marker phenotype. 
Each line was propagated through 550 single-cell bottlenecks by picking a colony at 
random from a Davis minimal agar plate with glucose at 200 1g/ml and streaking 
the cells onto a fresh plate. Given ~25 cell doublings to produce a typical colony“!, 
the 550 cycles represent ~13,750 generations. The bottlenecks imposed by this 
procedure eliminate the genetic variation that fuels adaptation by natural selection; 
as a consequence, mutations accumulate at rates that depend on their underlying 
mutation rate but not their fitness effects, except for highly deleterious mutations 
that preclude sufficient growth to form a colony”’. Because more mutations are 
deleterious than are beneficial, fitness declined under this regime (Extended Data 
Fig. 8). The 15 sequenced clonal isolates, each from a different MAE line, are 
JEB807-JEB821 (Supplementary Data 1). None of the lineages became hypermu- 
table based on their mutational signatures and the absence of significant hetero- 
geneity in the total mutations accumulated (see main text). However, the mean 
per-generation rate at which synonymous mutations arose was ~3.5-fold higher 
in the MAE lines than in the five LTEE populations that remained non-mutators 
for all 50,000 generations (Supplementary Data 4; t,= 3.0755, P=0.0065). This 
difference may reflect the different conditions in liquid and agar media, including 
the glucose concentration and local cell density, which might affect the reactive 
oxygen species that cells experience. The comparisons between the LTEE and 
MAE (Extended Data Fig. 9) would change if the underlying rates of the various 
types of mutation responded disproportionately to the different conditions in the 
MAE. That possibility seems implausible for the different classes of point mutation 
(Extended Data Fig. 9a, b), and the differences would have to be substantially 
larger than the different rates of synonymous mutations to produce the excess 
S150 insertions (Extended Data Fig. 9c) and large indels (Extended Data Fig. 9f) 
observed in the LTEE relative to the MAE. 

Genome sequencing. Frozen samples from the LTEE and MAE were revived via 
overnight growth at 37°C in either LB or Davis minimal medium supplemented 
with 1,000j1g/ml glucose. Genomic DNA was isolated from each culture using the 
Qiagen Genomic-tip 100/G kit or equivalent. The DNA samples were sequenced 
at Genoscope or Integragen SA (Evry, France), the Michigan State University 
Research Technology Support Facility (East Lansing, USA), or the University of 
Texas at Austin Genome Sequencing and Analysis Facility (Austin, USA). Iumina 
Genome Analyzer and HiSeq instruments were used to generate single-end or 
paired-end reads ranging in length from 35 to 150 bases according to standard 
procedures, with median coverage of 80-fold and 95-fold for the 264 LTEE and 15 
MAE clones, respectively (Supplementary Data 1). Of the 264 LTEE genomes in 
this study, 40 were previously analysed in other studies®!°>5->”, Supplementary 
Data 4 shows the number of every type of mutation inferred after performing 
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the analyses described below on each of the LTEE and MAE genomes used in 
this study. 

Mutation calling. We used breseq (versions 0.26.0 to 0.27.0) to predict both 
single-nucleotide and structural differences**° based on how the Illumina reads 
for each sample mapped to the genome sequence of E. coli B REL606 (GenBank 
accession NC_012967.1)°*. We counted and classified mutations using an updated 
version of the REL606 reference genome with improved feature annotations. The 
updated genome file (in both GenBank and GFF3 formats) and lists of predicted 
mutations in each evolved genome (in the Genome Diff format described in an 
appendix to the breseq manual) are freely available online (http://github.com/ 
barricklab/LTEE-Ecoli). 

Most types of single-step mutations, including large deletions and transposition 
events leading to copies of IS elements at new positions in the genome, are directly 
predicted by breseq when they occur in non-repetitive genomic regions. The initial 
lists of predicted mutations were curated and refined as previously described”*. 
Briefly, complex mutations involving multiple steps (such as a new IS insertion 
followed by a flanking deletion) and structural mutations that overlap repetitive 
regions of the genome were manually resolved from unassigned new junction and 
missing coverage evidence in the breseq output. Large duplications and amplifica- 
tions were detected by examining the coverage depth of mapped reads across the 
reference genome and comparing this information with the positions of repeat 
sequences and unassigned junctions. Owing to limitations of short-read DNA 
sequencing data, we could not fully predict point mutations and indels of one to a 
few base pairs within repeat regions (for example, IS elements) or gene conversions, 
in which intragenomic recombination between nearly identical copies of a large 
repeat region (for example, the seven copies of the rRNA operon) converts a minor 
variation in one copy to match exactly the sequence of another copy. Instead, all 
such genetic changes in repetitive regions of the genome were uniformly ignored 
in downstream analyses, as described later. 

To validate the final lists of mutations predicted in each clone, we applied 
these changes to the ancestral REL606 sequence and used breseq to compare the 
Illumina reads against this simulated evolved genome to verify there were no 
further, unexplained discrepancies. This step of applying mutations to the reference 
genome was also used to estimate the final genome size of each evolved clone, with 
the assumption that new IS insertions were of the most common size for that IS 
element in the reference genome. 

For 6 of the 264 LTEE samples, there was evidence of non-clonality in the 
sequence data. Some samples appeared to be mixtures of two very closely related 
clones that shared nearly all mutations but had one to several mutations specific 
to each type, together adding to a frequency of 100% (for example, sets of muta- 
tions at frequencies of 35% and 65%). This situation might result from inadvert- 
ently sampling two adjacent colonies on an agar plate when picking clones from 
an LTEE population. In other cases, only one or two mutations were found at 
an intermediate frequency. This type of heterogeneity might arise from strong 
selection favouring new mutations during colony outgrowth, subculturing and 
revival of samples before DNA extraction, as these conditions differ from the 
LTEE. In each case, we reconstructed the major genotype in the sample, as noted 
in Supplementary Data 1. 

We also ignored putative genome variation associated with a cryptic 186-like 
prophage element (REL606 genome coordinates 880528-904682). In ten of the 
LTEE populations, we observed clones with increased read-coverage depth of 
this region and reads spanning a new sequence junction consistent with either 
tandem head-to-tail amplifications of this region or the production of circular 
DNA molecules joined at these exact nucleotides. The changes in the apparent 
copy number of this region often deviated from the integer values expected for 
a stable duplication or amplification. The prophage-related changes in coverage 
appeared most often in genomes isolated from 2,000 generations or earlier in the 
LTEE. There is no evidence of infective phage production in the LTEE, but it is 
possible that replication of DNA encoding a defective phage occurs stochastically 
at some low level in the ancestral strain REL606 or that production of this DNA is 
induced by stress when culturing samples for DNA isolation. 

Phylogenetic consistency. Owing to the long duration of the LTEE and the evolution 
of mutators in several lineages, some mutations may be hidden or initially grouped 
with other mutations into a single change when comparing a late-generation 
evolved genome with the ancestral genome. For example, a point mutation might 
occur early in the experiment and then the region containing that mutation is 
later deleted. Similarly, the deletion of one base early and the subsequent deletion 
of an adjacent base would be called as a single two-base deletion in later samples. 
To obtain more accurate counts in light of these issues, we used each population's 
inferred phylogeny to split or add mutations, as appropriate, so that the mutation 
list for each clone reflects the most parsimonious set of mutational steps between 
that clone and its ancestor. Specifically, we chose histories with the fewest total 
mutations, the fewest mutations on early branches (in case of ties), and the fewest 
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total nucleotide changes summed over all mutations. Because this procedure is con- 
servative in adding mutations to achieve phylogenetic consistency, it might under- 
estimate the number of mutations on branches leading to an evolved genome when 
intermediate states are not resolved by the relationships of the sequenced clones. 
Final mutation lists. We performed two final filtering steps to enable the sets of 
mutations to be uniformly compared across all genomes. In doing so, we classi- 
fied as ‘small mutations all single-nucleotide substitutions, insertions and dele- 
tions of 20 or fewer bp, substitutions replacing 20 or fewer bp in the reference 
genome with 20 or fewer other bp, and all simple sequence repeat (SSR) mutations 
regardless of their size. SSR mutations add or remove one or more copies of a 
tandem-repeat unit consisting of one or a few bp. We defined SSR mutations as 
containing at least two copies of the repeat unit and having a total length of at least 
five bp when including all copies of the tandem repeat in the reference genome. 
For example, the genetic changes GGGGG->GGGG, TATATA—TATATATA 
and TACGTTACGT-—TACGT would all be classified as SSR mutations, but 
GGGG—GGGGG, TATA—TATATA and TACGT—TACGTTACGT would 
not. All other genomic changes were considered ‘large mutations’ for purposes of 
filtering. 

The ability to call small mutations located in repetitive regions of the genome 
is dependent on read length, so we removed all such mutations in regions where 
it would be a problem to uniformly detect them from the mutation lists before 
further analyses. To do this, we enumerated all regions of > 20 bp that had an exact 
match elsewhere in the genome of the ancestral strain REL606 using MUMmer 
v.3.23 (ref. 58). We then merged regions from this list that were separated by five 
or fewer bp. All resulting regions that were now > 35 bp were included in a list of 
masked genomic intervals. We also added to this list a hypervariable SSR consisting 
of seven copies of a tetranucleotide sequence that could not be reliably called in 
data sets with short reads (coordinates 2103889-2103919). Any small mutations 
contained in these masked regions were excluded from all downstream analyses. 

Finally, we flagged all nucleotide substitutions or small indels occurring within 
20 bp of the end of an IS element. The sequences directly adjacent to IS elements 
appear to experience an unusually high mutation rate, possibly due to frequent 
transposase cleavage and DNA repair. Mutations at these IS-adjacent sites probably 
have no effect on cellular phenotypes and fitness. We excluded them from the final 
lists of mutations used in all further analyses because they could bias the inferred 
mutational spectra and rates. 

Phylogenetic analyses. To produce the phylogenetic trees shown in Fig. 2, we used 
the point mutations associated with each clone. A minimum-evolution tree was 
built using the Jukes—Cantor one-parameter model”®. We used this model for two 
reasons. First, the mutator lineages had very different mutational spectra from the 
non-mutators??°°>>”, Second, many mutations seen in non-mutator lineages were 
under positive selection, and so it is appropriate to give the mutations equal weight 
and not, for instance, reduce the importance of transitions relative to transversions. 
The trees were plotted with the R package APE. The composite tree has the star- 
like structure expected for independent evolution of the populations. Therefore, 
trees were made separately for each population and then combined in Fig. 2, which 
allowed multiple basal branches to be placed with the appropriate populations. 

Parallel evolution in non-mutator lineages. For genomes that did not come from 
point-mutation hypermutator lineages (Supplementary Data 1), we examined the 
extent of parallelism at the gene level in two ways. The first approach was based 
only on nonsynonymous mutations, because it is straightforward to quantify the 
overall extent of parallelism, determine the statistical significance of the paral- 
lelism, and rank genes based on their contributions to the significance. For each 
protein-coding gene i, we know its length, L;, and the number of independent non- 
synonymous mutations observed in that gene across all clones from non-mutator 
and premutator lineages, N;. We summed the lengths and relevant mutations 
over all single-copy protein-coding genes in the ancestral genome to obtain Ltot 
(3,920,306) and Niot (457, including two mutations that each affected overlapping 


reading frames), respectively. We computed the expected number of mutations in 
each gene, E;, as follows: 


Ej = Neot (Li/Ltot) 
We then computed a G; score for each gene for which N; > 0 as follows: 
G; = 2N; log (Nj/E;) 


We set Gj=0 for those genes for which N;=0. This analysis ignores variability 
among genes in the proportion of sites at risk for nonsynonymous mutations. 
However, such differences are small and should hardly affect the analysis. The total 
G statistic equals the sum of the scores over all genes. To compute the expected 
G statistic under the null hypothesis of a random distribution of mutations, we 
generated 1,000 simulated data sets in which Njot mutations were randomly 
placed throughout the coding genome. We computed the total G statistic for each 
simulated data set, and we calculated its mean and standard deviation across the 
1,000 simulations. To assess the significance of the observed G statistic, we com- 
puted the Z score as the difference between the observed and mean simulated values, 
divided by the standard deviation of the simulated values. Supplementary Data 2 
lists each gene and the information used to calculate its G score. Table 1 shows the 
15 genes with the highest G scores. 

Supplementary Data 2 also shows other categories of mutation in or near each 

protein-coding gene including synonymous mutations, intergenic point muta- 
tions (between any particular gene and one of its immediately adjacent genes), 
IS insertions, small indels (<50 bp), large deletions (>50 bp) and long duplications 
(>50bp). Table 2 shows the 16 genes that had the most total deletions, duplications, 
insertions and intergenic point mutations (that is, all mutations except synonymous 
and nonsynonymous mutations in the coding gene itself). 
Parallel evolution in mutator lineages. We examined parallel changes in lineages 
that evolved point-mutation hypermutability by analysing nonsynonymous 
substitutions as above. To identify mutations that occurred after a lineage became 
hypermutable (Supplementary Data 3), we subtracted the mutations that occurred 
on non-mutator branches from the total mutations. This approach may result ina 
few mutations that arose before hypermutability being included in the counts for 
mutator lineages, but given the large increases in the point-mutation rate in the 
mutators (Fig. 1) it provides a reasonable approximation. 
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Box-and-whiskers plot showing the distribution of average genome the data (25th to 75th percentiles); the thick black lines are medians; the 
length (Mb) for each of the 12 LTEE populations based on the two clones whiskers extend to the outermost values that are within 1.5 times the IQR; 
sequenced at each time point shown from 500 to 50,000 generations. and the points show all outlier values beyond the whiskers. 


The red line shows the length of the ancestral genome. The boxes are the 
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serves as a proxy for the underlying point-mutation rate. All four of 
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Extended Data Figure 3 | Alternative models fit to trajectory of genome 
evolution for each LTEE population. a, Ara—1. b, Ara+1.c, Ara—2. 

d, Ara+2. e, Ara—3. f, Ara+3. g, Ara—4. h, Ara+4. i, Ara—5. j, Ara+5. 

k, Ara—6. 1, Ara+6. Each symbol shows the total mutations in a sequenced 
genome; in many cases, the symbols for the two genomes from the same 
population and generation are not distinguishable because they have 


the same, or almost the same, number of mutations. For the populations 
that evolved hypermutability, data are shown only for time points before 
mutators arose. In each panel, the dashed grey line shows the best fit to 
the linear model; the solid grey curve shows the best fit to the square-root 
model; and the solid black curve shows the best fit to the composite model 
with both linear and square-root terms. 
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Extended Data Figure 4 | Uncertainty in parameter estimation for the premutator lineages (Fig. 3). The black central point shows the maximum 
model describing the rates of accumulation for neutral and beneficial likelihood estimates, and the three black contours show solutions 2, 6 and 
mutations. Contours show relative likelihoods for simultaneously 10 log units away. The points on the horizontal and vertical axes show 
estimating the linear and square-root coefficients from the observed values for the best one-parameter models. 
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Extended Data Figure 5 | Accumulation of synonymous substitutions Small horizontal offsets were added so that overlapping points are visible. 
in non-mutator lineages. Each filled symbol shows the mean number Colours are the same as in Fig. 1. Open triangles show the grand means of 
of synonymous mutations in the (usually two) non-mutator genomes the replicate populations. The grey line extends from the intercept to the 
from an LTEE population that were sequenced at that time point; non- final grand mean. The slope of that line was used to scale the relative rates 
integer values can occur if the two genomes have different numbers. of synonymous, nonsynonymous and intergenic point mutations in Fig. 4. 
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Extended Data Figure 6 | Temporal trend in accumulation of numbers of genomic sites at risk for nonsynonymous and synonymous 
nonsynonymous mutations relative to the neutral expectation in non- mutations. Each point shows the average rate calculated for a non-mutator 
mutator lineages. Interval-specific accumulation of nonsynonymous or premutator population; small horizontal offsets were added so that 
mutations calculated from changes in the total number of nonsynonymous __ overlapping points are visible. Note the discontinuous scale; populations 
mutations between successive samples. As with the cumulative data with no additional mutations over an interval are plotted below. Colours 
in Fig. 4b, values are scaled by the average rate of accumulation for are the same as in Fig. 1. Black lines connect grand means; the grey 
synonymous mutations over 50,000 generations, after adjusting for the shading shows standard errors calculated from the replicate populations. 
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types of genetic change for all independent mutations found in the set nonsense mutations, and the ‘other’ category includes rare point mutations 
of non-mutator clones that were sequenced at each generation. The total in noncoding RNA genes and pseudogenes. 
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ancestor (REL1207) or one of the 15 MAE lineages (JEB807-JEB821) and fitness equals 1. Ten of the fifteen MAE lines experienced significant 
the Ara” variant of the MAE ancestor (REL1206). One-day competition fitness declines, while none had significant gains. 
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A multi-modal parcellation of human 


cerebral cortex 


Matthew F. Glasser!, Timothy S. Coalson!*, Emma C. Robinson?**, Carl D. Hacker**, John Harwell!, Essa Yacoub®, 
Kamil Ugurbil®, Jesper Andersson’, Christian F. Beckmann’, Mark Jenkinson”, Stephen M. Smith? & David C. Van Essen! 


Understanding the amazingly complex human cerebral cortex requires a map (or parcellation) of its major subdivisions, 
known as cortical areas. Making an accurate areal map has been a century-old objective in neuroscience. Using multi- 
modal magnetic resonance images from the Human Connectome Project (HCP) and an objective semi-automated 
neuroanatomical approach, we delineated 180 areas per hemisphere bounded by sharp changes in cortical architecture, 
function, connectivity, and/or topography in a precisely aligned group average of 210 healthy young adults. We 
characterized 97 new areas and 83 areas previously reported using post-mortem microscopy or other specialized study- 
specific approaches. To enable automated delineation and identification of these areas in new HCP subjects and in future 
studies, we trained a machine-learning classifier to recognize the multi-modal ‘fingerprint’ of each cortical area. This 
classifier detected the presence of 96.6% of the cortical areas in new subjects, replicated the group parcellation, and 
could correctly locate areas in individuals with atypical parcellations. The freely available parcellation and classifier 
will enable substantially improved neuroanatomical precision for studies of the structural and functional organization 
of human cerebral cortex and its variation across individuals and in development, aging, and disease. 


Neuroscientists have long sought to subdivide the human brain into a 
mosaic of anatomically and functionally distinct, spatially contiguous 
areas (cortical areas and subcortical nuclei), as a prerequisite 
for understanding how the brain works. Areas differ from their 
neighbours in microstructural architecture, functional specialization, 
connectivity with other areas, and/or orderly intra-area topographic 
organization (for example, the map of visual space in visual cortical 
areas)!~, Accurate parcellation provides a map of where we are in 
the brain, enabling efficient comparison of results across studies and 
communication among investigators; as a foundation for illuminating 
the functional and structural organization of the brain; and as a means 
to reduce data complexity while improving statistical sensitivity and 
power for many neuroimaging studies. 

The human cerebral cortex has been estimated to contain any- 
where from ~50 (ref. 1) to ~200 (refs 3, 4) areas per hemisphere. 
However, attaining a consensus whole-cortex parcellation has been 
difficult because of practical and technical challenges that we address 
here. 

Most previous parcellations were based on only one neurobiological 
property (such as architecture, function, connectivity or topography), 
and many cover only part of the cortex. Combining multiple properties 
provides complementary as well as confirmatory information, as 
different properties distinguish different sets of areal boundaries, 
and more confidence can be placed in boundaries that are consistent 
across multiple independent properties. We analysed all four proper- 
ties across all of neocortex in both hemispheres, using new or refined 
methods applied to the uniquely rich repository of exceptionally 
high-quality magnetic resonance imaging (MRI) data provided by 
the Human Connectome Project (HCP), which benefited from major 
advances in image acquisition and preprocessing ®. Architectural 
measures of relative cortical myelin content and cortical thickness were 


derived from T1-weighted (T1w) and T2-weighted (T2w) structural 
images*?"°. Cortical function was measured using task functional MRI 
(tfMRI) contrasts from seven tasks!". Resting-state functional MRI 
(rfMRI) revealed functional connectivity of entire cortical areas plus 
topographic organization within some areas. 

Previous parcellations typically used either fully automated 
algorithmic approaches, or else manual or partly automated 
neuroanatomical approaches in which neuroanatomists delineated 
areal borders, documented areal properties, and identified areas after 
consulting prior literature. Here we combined both approaches. For 
the initial parcellation, we adapted a successful observer-independent 
semi-automated neuroanatomical approach for generating post- 
mortem architectonic parcellations'*!? to non-invasive neuroimaging. 
We used an algorithm to delineate potential areal borders (transitions 
in two or more of the cortical properties described above), which two 
neuroanatomists (authors M.EG. and D.C.V.E.) then interpreted, 
documenting areal properties and identifying areas relative to the 
extant neuroanatomical literature. We then used a fully automated 
algorithmic approach, training a machine-learning classifier to 
delineate and identify cortical areas in individual subjects based 
on multi-modal areal fingerprints, allowing the parcellation to be 
replicated in new subjects and studies. 

Prior parcellations have either used small numbers of individuals 
or group averages that are ‘blurry’ from inaccurate alignment of brain 
areas across subjects. We aligned cortical data using ‘areal features, 
including maps of relative myelin content and resting state networks 
that are more closely tied to cortical areas than are the folding patterns 
typically used for alignment'*. The markedly improved intersubject 
cortical alignment using cortical folding, myelin, and resting state {MRI 
enabled us to generate the ‘typical subject’s’ parcellation from a highly 
detailed 210-subject group average data set. 
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Group map reproducibility of fine details 

We analysed two independent groups of HCP subjects—210P 
(‘parcellatiom) and 210V (‘validation’)—aligned using areal-feature- 
based registration (called “MSMAII, see Methods section on image 
preprocessing). Figure 1 illustrates the consistency of fine spatial 
patterns for maps reflecting relative myelin content (left panels) and 
a task-f{MRI language-related activation (right panels). The maps 
are strikingly similar across the 210P and 210V groups, including 
variations in relative myelin content within the primary somatosen- 
sory cortex related to somatotopic organization (white and black arrows 
in left panels, see legend) and small features in the task {MRI maps 
(white ellipses in right panels). Supplementary Results and Discussion 
1.1 and associated Supplementary Figs 1-5 include more examples 
of such cross-group consistency for architecture (myelin and cortical 
thickness), function (t{MRI contrast maps), and two resting state 
connectivity measures. 

Because of the areal-feature-based alignment, maps of average 
cortical folding in this study are much blurrier than are maps of areal 
properties because folding patterns and area locations are imperfectly 
correlated*'*"4 (for example, compare Supplementary Fig. 7e and 7} 
(group average and individual subject folding) in Supplementary 
Results and Discussion 1.3). Group average folding patterns remain 
sharp mainly in early sensory areas, where areal locations and folds 
are tightly correlated (for example, central and calcarine sulci, 
see Supplementary Fig. 1, rows 3 and 4, in Supplementary Results and 
Discussion 1.1). The regional difference in sharpness between maps of 
areal properties and folding highlights the importance of alignment 
based on areal features, rather than folding patterns, as a prerequisite for 
accurately parcellating group average data. The high spatial resolution 
of the HCP’s MRI images and lack of aggressive spatial smoothing’? 
prior to group averaging also contribute to making our maps 
substantially sharper than those from traditional neuroimaging studies. 

Quantitatively, the 210P and 210V group average datasets were highly 
correlated across the cortical surface (r= 0.998 for myelin; r=0.994 for 
cortical thickness after correction for folding-related effects; r= 0.996 
and r=0.979 for two folding-related measures (FreeSurfer’s ‘sulc’ and 
‘curv, respectively); r=0.995, r=0.984 and r=0.944 for the maximum, 
median and minimum, respectively, of the task fMRI contrasts, and a 
median reproducibility of r= 0.989 for two measures of resting state 
connectivity). These excellent map reproducibilities provide confidence 
that the parcellation will reflect the areal pattern of typical subjects in 
the healthy young adult population. See Methods section on modalities 
for parcellation, and Supplementary Methods 1.3-3.4 for the methods 
used to generate these maps. 


Myelin map 


Figure 1 | Consistency of fine spatial details in independent group 
averages. Relative myelin content maps (left hemisphere) and task 
fMRI contrast beta maps from the LANGUAGE story contrast (right 
hemisphere) on inflated (columns 1 and 3) and flattened surfaces 
(columns 2 and 4). Rows 1 and 2 are the group averages of the 210P and 
210V data sets, respectively. White and black arrows indicate consistent 
variations in myelin content within primary somatosensory cortex that 
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A 180-area group average parcellation 

To identify transitions representing candidate areal boundaries, we 
designed and implemented a semi-automated, quantitative approach 
adapted for multi-modal neuroimaging data represented on two- 
dimensional cortical surface models (see Methods section on the 
gradient-based parcellation approach and Supplementary Methods 
4.1-5.3). The approach is similar in spirit to a highly successful 
semi-automated observer-independent approach!*"!>. However, 
instead of objectively identifying potential areal borders in postmortem 
histological sections, we identified them algorithmically on the cortical 
surface by computing the first derivative of each areal feature map (its 
spatial gradient magnitude)'®. Candidate borders were then interpreted 
by the neuroanatomists to exclude artefacts. Each area's properties were 
documented (in the Supplementary Neuroanatomical Results), and 
putative areas were related to the extant neuroanatomical literature. 

These semi-automated approaches contrast with classical 
observer-dependent parcellation approaches!” that have relied on 
visual inspection to locate often subtle transitions in cortical architec- 
ture and with some modern observer-dependent retinotopic parcella- 
tion methods!”'®, They also differ from fully automated, unsupervised 
methods’? *! in which the outcomes depend heavily on algorithmic 
input parameters (for example, thresholds or number of requested 
clusters) and are not validated by a neuroanatomist. 

Area 55b illustrates our multi-modal gradient-based parcellation 
approach using gradients of three areal feature maps (see Fig. 2). Area 
55b is a small, elongated, and notably distinct area (outlined in black 
or white) bounded by the frontal eye field (FEF) and premotor eye field 
(PEF), primary motor cortex (4), ventral premotor cortex (6v), and pre- 
frontal areas 8Av and 8C. In the myelin map (Fig. 2a), area 55b is lightly 
myelinated and lies between moderately myelinated areas FEF (above) 
and PEF (below), just anterior to heavily myelinated primary motor 
cortex (area 4). Thus, area 55b is surrounded on three sides by myelin 
gradients (Fig. 2e). Area 55b is strongly activated in the ‘Story versus 
Baseline’ task contrast from the HCP’s ‘LANGUAGE task (Fig. 2b) 
and is entirely surrounded by a strong gradient for this task contrast 
(Fig. 2f). It also has distinctive functional connectivity, as revealed by a 
seed location (lightly myelinated area PSL) selectively connected with 
55b (Fig. 2c) and a different seed location (heavily myelinated area 
LIPy) strongly connected with FEF and PEF (Fig. 2d) but not with 55b. 
The result is strong mean gradients in dense functional connectivity 
surrounding 55b (Fig. 2g). Ref. 22 illustrated area 55b on a schematic 
surface map (Fig. 2h) as a lightly myelinated area bounded on three 
sides by more heavily myelinated areas. Because of the similarity to the 
dorsal portion of 55b in ref. 22, we use the same name. 


Task fMRI story 


—0.75 0 0.75 


are correlated with somatotopy (see Supplementary Neuroanatomical 
Results 6 and Supplementary Neuroanatomical Results Fig. 8). The 

white oval indicates a small, sharp, and reproducible feature in the right 
hemisphere of the LANGUAGE story contrast. Relative myelin content 
will hereafter be referred to as myelin (see legend of Supplementary Fig. 1 
in Supplementary Results and Discussion 1.1). Data at http://balsa-wustl. 
edu/WDpx. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


Myelin map Task fMRI 


ARTICLE 


Functional connectivity maps 
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Figure 2 | Parcellation of exemplar area 55b using multi-modal 
information. The border of 55b is indicated by a white or black outline. 
a, Myelin map. b, Group average beta map from the LANGUAGE Story 
versus Baseline task contrast. c, d, Functional connectivity correlation 
maps from a seed in area PSL (white sphere, arrow) (c) and a seed in 
area LIPv (white sphere, arrow) (d). e, Gradient magnitude of the myelin 
map shown ina. f, Gradient magnitude of the LANGUAGE Story versus 


To generate the complete parcellation of 180 areas and area 
complexes in each hemisphere, we adopted a systematic, objective, and 
quantitative approach (see the gradient-based parcellation approach 
section in the Methods and in Supplementary Methods 5.1-5.3). Our 
major criteria, met in nearly all cases, included: (i) spatially overlapping 
gradient ‘ridges’ between each pair of areas for at least two independent 
areal feature maps; (ii) similar gradient ridges present in roughly cor- 
responding locations in both hemispheres; (iii) gradients that were not 
correlated with artefacts; and (iv) robust and statistically significant 
cross-border differences in the feature maps. Another consideration 
(but not a requirement) was whether published evidence exists for a 
boundary in an approximately corresponding location. Studies with 
publicly available parcellations registered onto atlas surfaces* were 
directly compared with our data; however, most regions required 
indirect comparisons with published figures (for example, Fig. 1h). 


Baseline task contrast shown in b. g, Mean gradient magnitude of the 
functional connectivity dense connectome (see section on modalities for 
parcellation in the Methods). h, A dorsal schematic view of the prefrontal 
cortex as parcellated in ref. 22, in which shading indicates the amount 

of myelin found using histological stains of cortical grey matter. Data at 
http://balsa.wustl.edu/Qv4P. 


Initial areal boundaries meeting these criteria were delineated by two 
neuroanatomists (authors M.EG. and D.C.V.E.). 

In a second computational stage, the path of each manually drawn 
border was optimized algorithmically using gradients of the most 
informative feature maps selected by the neuroanatomists (those with 
visually obvious gradients and differences across the border). These 
feature maps were confirmed to have robust and statistically significant 
differences across the final border. The semi-automated gradient-based 
parcellation approach is further described in Supplementary Methods 
5.1-5.3), and the entire semi-automated process is illustrated for area 
V1 in Supplementary Neuroanatomical Results 1; other sections of this 
document describe and illustrate the information used to delineate and 
the literature used to name all 180 cortical areas. 

Figure 3 shows the multi-modal cortical parcellation in the left 
and right hemispheres on inflated and flattened surfaces, with areal 


The HCP’s multi-modal cortical parcellation (HCP_MMP1.0) 
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Figure 3 | The HCP’s multi-modal parcellation, version 1.0 (HCP_ 
MMP1.0). The 180 areas delineated and identified in both left and right 
hemispheres are displayed on inflated and flattened cortical surfaces. Black 
outlines indicate areal borders. Colours indicate the extent to which the 
areas are associated in the resting state with auditory (red), somatosensory 
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Auditory 
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(green), visual (blue), task positive (towards white), or task negative 
(towards black) groups of areas (see Supplementary Methods 5.4). 
The legend on the bottom right illustrates the 3D colour space used 
in the figure. Data at http://balsa.wustl.edu/WN56. 
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boundaries delineated by black contours. A total of 180 areas and areal 
complexes per hemisphere is near the higher end of earlier estimates 
noted above**. We consider 180 likely to be a lower bound, as some 
parcels are probably complexes of multiple areas (for example, based 
on finer-grained published parcellations, and other regions that 
suffer from reduced sensitivity due to {MRI signal loss). Many areas 
(83/180) were assigned names based on published parcellations from 
dozens of separate studies that used a variety of invasive or specialized 
methods (see Table 2 in Supplementary Neuroanatomical Results), 
reflecting how far the field has been from a consensus neuroanatomical 
parcellation. Some of the newly described 97 areas have de novo names 
(for example, DVT for the dorsal visual transitional area), while others 
represent finer-grained parcellations of previously reported areas 
(for example, area 31 into areas 31a, 31pd, and 31pv). A few repre- 
sent complexes in which a published finer grained parcellation was 
not visible in our data (for example, areas 29 and 30 combined into 
area the retrosplenial complex (RSC)), but these may be again sub- 
divided once higher resolution data is available. The 180 areas differ 
widely in their shapes, sizes, and the positions of their borders relative 
to cortical folds. 

The parcellation in Fig. 3 is coloured to reflect each area’s degree of 
association in the resting state (determined using multiple regression, 
see Supplementary Methods 5.4) with five functionally specialized 
groups of areas: early auditory (red), early somatosensory/motor 
(green), and early visual areas (blue). These represent the three 
dominant input streams to the brain. Also used were two core groups 
of cognitive areas that are strongly anti-correlated in our data, the task 
positive network (towards white) and task negative (also called the 
default mode) network (towards black). Hence, the strongly bluish, 
greenish, and reddish regions are predominantly but not exclusively 
associated with visual, somatosensory-motor, and auditory processing, 
respectively. Qualitatively, the predominantly unimodal regions appear 
to collectively occupy less than half of the neocortical sheet. Areas 
probably more strongly biomodal include blue-green areas such as LIPv 
and MT (visual and somatosensory-motor) and purple areas such as 
POS2 and RSC (visual and auditory). The remaining regions form a 
complex mosaic, with some intermixing of lighter (task-positive) and 
darker (task-negative) areas along with many lighter or darker pastel 
hues suggestive of ‘cognitive’ areas that may be preferentially associated 
with one or another sensory modality. The bilateral symmetry of func- 
tional organization is striking, in that nearly all areas have qualitatively 
very similar hues in the left and right hemispheres. However, interesting 
colour asymmetries occur in a few areas, especially language-related 
areas 55b, PSL, SFL, and 44 and their right hemisphere homologues, 
which also have asymmetric task-fMRI functional profiles 
(see Supplementary Neuroanatomical Results 8, 15, 21 and 22). 

Internal heterogeneity is evident in some cortical areas, particularly 
those with topographically organized representations. In the 
somatosensory-motor strip (largely architecturally defined soma- 
tosensory and motor areas 3a, 3b, 1, 2, and 4), we identified five clearly 
defined topographic subareas in resting state and task {MRI data 
(see Supplementary Neuroanatomical Results 6 and the associated 
Supplementary Fig. 8). In this parcellation we treat topographic 
subdivisions as ‘subareas’ rather than calling them full ‘areas. For 
visual cortex, its visuotopic organization revealed a set of hemifield 
representations in each hemisphere, something not achieved in 
previous unsupervised resting state functional connectivity-based 
parcellations!?7!°, Also, ultrahigh-field MRI reveals sub-areal cortical 
organization along both laminar**”°, and columnar?®?’ axes, so our 
parcellation represents one of many important levels of granularity in 
brain organization. 


Cross -validation of the parcellation 

The initial statistical analysis used in the semi-automated parcella- 
tion was circular, to the extent that the 210P dataset was used for both 
creating and testing the parcellation. Hence, we carried out an additional 
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statistical cross-validation using the 210V dataset and a comprehensive 
set of feature maps (see the statistical cross validation of the multi-modal 
parcellation section of the Methods and Supplementary Results and 
Discussion 1.2). This analysis also reveals which areal properties were 
most useful in defining areal boundaries (a condensed representa- 
tion of the detailed information provided in the Supplementary 
Neuroanatomical Results). Supplementary Fig. 6 in Supplementary 
Results and Discussion 1.2 shows four independent categories of 
features: cortical thickness, myelin maps, task fMRI, and resting state 
fMRI, and how many of these categories showed robust and statistically 
significant differences across each areal border. Fully 96% of areal 
borders had robust effect sizes (Cohen’s d > 1) in two or more feature 
categories and all were statistically significant after correcting for 
multiple comparisons in two or more feature categories in cross-border, 
across-subject t-tests. Resting state {MRI was the most useful category, 
followed by task f{MRI, myelin maps, and lastly cortical thickness, 
which was consistent with the neuroanatomists’ observations and 
documentation in the Supplementary Neuroanatomical Results. 


Exemplar parcellation-based analyses 

Spatial smoothing is often used to increase the signal-to-noise ratio 
(SNR) in neuroimaging analyses, to try to compensate for inaccurate 
registration of brain areas, and/or to satisfy statistical assumptions. 
However, smoothing blurs data across boundaries between areas 
(on the surface) and tissue compartments (in the volume). An areal 
parcellation enables area-wise analyses (averaging data within each 
area), thereby improving SNR and statistical power without the 
deleterious effects of spatial smoothing (to the extent that properties 
within an area are uniform). Parcellation dramatically reduces data 
dimensionality, illustrated here using the HCP’s myelin, thickness, task, 
and resting state data (Fig. 4). 

The ‘dense’ (vertex-wise) myelin map shown in Fig. 4a has ~30,000 
surface grey matter vertices per hemisphere, whereas a ‘parcellated’ 
myelin map (Fig. 4f) shows the same overall pattern with 180 cortical 
areas (vertices within an area have the same value, see also Fig. 4g 
for parcellated cortical thickness). Example dense and parcellated 
task fMRI analysis contrast maps (Figs. 4b, c, LANGUAGE Story 
versus Baseline) can be represented as a single column (white) in a 
180-area by 86-task-contrast matrix (Fig. 4d). Parcellated analyses 
hold great promise for task {MRI studies, as they improve the signal- 
to-noise ratio by averaging fMRI time series within parcels prior to 
fitting the task design, increasing Z statistics (Fig. 4e). Parcellation is 
effectively a neurobiologically constrained smoothing approach that 
also increases statistical power by efficiently consolidating otherwise 
non-independent statistical tests. This approach will benefit studies 
aimed at understanding the functional and structural organization 
of the brain in health or disease at an area-wise level (studies that 
currently summarize results using three-dimensional coordinates in 
a standardized stereotaxic space). Parcellated analyses also aid in the 
clarity and efficiency of communicating results (for example: “area 55b 
in the left hemisphere showed a statistically significant +1% BOLD 
activation in my language task”). 

Parcellated analyses are comparably useful when characterizing 
structural or functional connectivity, as previously recognized*>®. 
Preprocessing of HCP data results in fMRI data represented as 
‘grayordinates’ (cortical grey matter surface vertices and subcortical 
grey matter voxels”). A dense connectome, containing connectivity 
between all pairs of 91,282 grayordinates is ~3.3 x 10°-fold larger 
than an area-wise parcellated connectome for ~500 areas (connectivity 
between all pairs of areas), yet the parcellated connectome captures 
the neurobiologically relevant variance at the areal level. Parcellated 
connectomes are illustrated using a seed location in area PGi (black 
dot) for full correlation (Fig. 4h) and partial correlation (Fig. 4i) 
functional connectivity brain maps together with their associated 
parcellated connectome matrices (Fig 4j, full correlation below 
and partial correlation above the diagonal). In both cases, the task 
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Figure 4 | Example parcellated analyses using the HCP’s multi-modal 
cortical parcellation. a, Dense myelin maps on lateral (top) and medial 
(bottom) views of inflated left hemisphere. b, c, Example dense (b) and 
parcellated (c) task {MRI analysis (LANGUAGE story versus baseline) 
expressed as Z statistic values. d, The entire HCP task fMRI battery's 
Z statistics for 86 contrasts (47 unique, see section on modalities for 
parcellation in the Methods) analysed in parcellated form and displayed 
as a matrix (rows are parcels, columns are contrasts, white outline 
indicates the map in c). e, A major improvement in Z statistics from fitting 
task designs on parcellated time series instead of fitting them on dense 
time series and then parcellating afterwards (blue points are 


negative (default mode) network is evident, though the partial 
correlation connectome is much sparser than the full correlation 
connectome. 


Individuals with atypical areal patterns 

The precisely aligned group average multi-modal cortical parcellation 
represents the overall spatial arrangement of cortical areas in the 
‘typical’ individual from a healthy young adult population. However, 
we found atypical topological arrangements of some areas in some 
individuals that are discernible across multiple modalities, including 
resting-state networks, task-fMRI activations, and myelin maps. 
Distinguishing genuinely atypical areal topologies from inadequately 
aligned typical patterns depended on the MSMAIlareal-feature-based 
registration to align cortical areas precisely. We summarize key findings 
here and extensively characterize this important phenomenon in the 
Supplementary Results and Discussion 1.3. 

Previously described area 55b and neighbouring areas FEF and 
PEF showed particularly notable individual differences in topological 
arrangements. For the 210P subjects, 89% showed the typical 
configuration in the left hemisphere (area 55b bordered by area PEF 
inferiorly and area FEF superiorly, as in Fig. 2), which was well aligned 
with group average area 55b after MSMAII registration. However, in one 
subgroup (4%, n =9), a patch having the multi-modal characteristics 
of area 55b is shifted superiorly relative to the upper limb subregion 
of sensori-motor cortex (Supplementary Figure 7 in Supplementary 
Results and Discussion 1.3). In another subgroup (6%, n= 12), area 
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360 parcels x 86 task contrasts; note the upward tilting deviation from the 
red line). f, Parcellated myelin maps. g, A parcellated folding-corrected 
cortical thickness map (in mm). h, i, Parcellated functional connectivity 
maps on the brain (seeded from area PGi, black dot). These parcellated 
connectomes are computed using either full or partial correlation (see 
Supplementary Methods 7.1). In both cases, the task negative (default 
mode) network is apparent. j, A parcellated connectome matrix view 
with the full correlation connectome below and the partial correlation 
connectome above the diagonal (white line shows the displayed partial 
correlation brain map). Data at http://balsa.wustl.edu/RGOx. 


55b is split into two pieces by a merger of areas FEF and PEE rather 
than the typical splitting of FEF and PEF by 55b (Supplementary 
Fig. 8 in Supplementary Results and Discussion 1.3). Such topological 
deviations in individual subjects’ areal maps raise intriguing questions 
for future exploration. They also cannot be corrected by a topology- 
preserving registration aimed at aligning individual subjects’ areas with 
the group average ‘atlas’ parcellation. Thus, we introduce an alternative 
fully automated cortical parcellation approach that can identify and 
delineate both typical and atypical areas in individual subjects that were 
nota part of the original 210P group. 


Automated individual-subject parcellation 

The semi-automated neuroanatomical approach described above is 
impractical for de novo individual subject parcellation of all ~1,100 
HCP subjects having complete MRI datasets so as to identify the 
atypical areal topologies mentioned above. Instead, we developed an 
automated method for generating individual subject parcellations 
based on a supervised machine learning classifier previously used to 
identify resting state functional networks in individual subjects”’. In 
our case, the areal classifier learns the multi-modal ‘areal fingerprint 
of each cortical area that distinguishes it from surrounding cor- 
tex. Based on multi-modal feature maps that represent the areal 
properties of architecture, function, connectivity, and topography, 
the areal classifier returns a prediction (0% to 100%) that each area 
exists at a given cortical surface vertex. The highest prediction value 
across areas at each vertex is used to generate the individual subject 
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parcellation (see the cortical areal classifier section in Methods and 
in Supplementary Methods 6.1-6.8). Once trained using the 210P 
subjects (and a separate ‘29T” group of test subjects, see the subjects and 
acquisitions section of the Methods), the areal classifier should be 
able to use only the multi-modal areal fingerprints that it has learned 
to reproduce the parcellation in an independent group of validation 
subjects (210V). 

A critical early test of the areal classifier was whether it could 
accurately and reliably map areas that are not aligned with the 
population-based atlas parcellation after MSMAII areal-feature-based 
alignment (see Supplementary Results and Discussion 1.4). Examples 
of successful classification of areas 55b, FEF, and PEF are shown in 
Supplementary Fig. 9 of the Supplementary Results and Discussion 
for typical subjects, shifted 55b subjects, and split 55b subjects. In each 
illustrated case, the classifier correctly identified 55b and its neighbours 
(as assessed by the neuroanatomists’ inspection of the multi-modal 
areal features shown in the figure). Supplementary Fig. 10 in the 
Supplementary Results and Discussion 1.4 shows that these atypical 
55b topologies and classifications are stable across widely spaced repeat 
scanning sessions in a ‘test-retest’ group of 27 subjects (see Methods 
section on subjects and acquisition). 


Areal detection and parcellation consistency 

Another critical test of both the parcellation and the areal classifier 
is the classifier’s performance in detecting the 180 cortical areas in 
individual subjects, particularly in independent validation subjects 
that were not used to generate the parcellation or train the classifier. 
The top two rows of Fig. 5 show the performance of the classifier in 
detecting each area (see the cortical areal classifier section in Methods). 
Importantly, the classifier aims to detect whole areas based on their 
multi-modal fingerprints, rather than detecting differences in areal 
features across paired areal boundaries as was done in the cross- 
validation analysis (Supplementary Fig. 6 in Supplementary Results 
and Discussion 1.2). The overall areal detection rate was 98.0% of 
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Figure 5 | Areal detection rates, probabilistic areas, and parcellation 
reproducibility. Rows 1 (210P) and 2 (210V) show the individual subject 
areal detection rates (see Methods section on cortical areal classifier) 

as parcellated maps. Most areas are yellow (100%), and the minimum 
detection rate across both rows was 73%. Rows 3 and 4 illustrate 
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all areas across all subjects for the 210P parcellation and training 
dataset (row 1) and 96.6% for the independent 210V validation 
dataset (row 2), indicating excellent overall performance of the areal 
classifier. 

The areal classifier was used to generate probabilistic maps of 
each cortical area (illustrating residual variability in spatial location 
after MSMAII areal feature-based registration), and to assess the 
reproducibility of the parcellation in the independent 210V dataset. 
Rows 3 and 4 of Fig. 5 show strikingly similar probabilistic maps of 
8 non-overlapping areas with differing degrees of spatial variability 
(V1, 4, RSC, MT, LIPv, TEla, 46, and 10r) from the 210P and 210V 
groups. All probability maps were combined to produce a group 
maximum probability map (MPM), where the area with the highest 
probability at each vertex was found. Row 5 shows the original semi- 
automated parcellation borders, and row 6 compares the group MPM 
maps from 210P (blue) and 210V (red), with purple representing 
overlapping vertices. The borders in Row 6 are almost entirely purple, 
indicating very high reproducibility of the group MPM maps (r=0.965, 
Dice = 0.960, see the cortical areal classifier section of Methods). This 
reproducibility is similar to that of the original group average feature 
maps discussed above. The correlation of the original semi-automated 
parcellation (row 5) with the 210P group MPM (row 6) was r= 0.913, 
Dice = 0.902, indicating that the classifier made modest adjustments 
to better fit the data. We predict there will be very high reproducibility 
of the parcellation across the rest of the ~1,100 subject HCP dataset. 
Example individual subject parcellations and their reproducibility 
based on repeated scan sessions are shown in Supplementary Fig. 11 of 
Supplementary Results and Discussion 1.4. The individual parcellations 
are reasonably reproducible (median r= 0.77, Dice = 0.72) but, 
unsurprisingly, not as reproducible as the group parcellations, which 
benefit from averaging across many subjects. Other analyses yield 
interesting information about the sizes of cortical areas in the group 
average and variability in areal size across individuals (Supplementary 
Results and Discussion 1.5). 


probabilistic maps of areas V1, 4, RSC, MT, LIPv, TEla, 46, and 10r for 
the 210P (row 3) and 210V (row 4) groups. Row 5 shows the original 
parcellation derived from the semi-automated neuroanatomical approach. 
Row 6 shows the group MPM maps from 210P (blue), 210V (red), and 
their overlap (purple). Data at http://balsa.wustl.edu/WL8m. 
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Generalizing the classifier for future studies 

In contrast to the semi-automated approach (above) where neuroanat- 
omists chose the information to delineate and identify the 180 cortical 
areas in group-average data (see Supplementary Neuroanatomical 
Results), the areal classifier automatically determines (without human 
intervention) what information is most useful for delineating and 
identifying these cortical areas in individual subjects. As illustrated in 
Supplementary Figure 12 of Supplementary Results and Discussion 1.6, 
the areal classifier uses the task {MRI data least, perhaps because task 
fMRI feature maps are noisier in individual subjects than other feature 
maps and their information content is largely redundant with the 
resting state data*®. This finding is important for generalizability of 
the areal classifier to other studies because replicating the custom- 
ized, hour-long HCP task fMRI battery is unlikely to be feasible for 
most neuroimaging studies. Ideally, the areal classifier would be able 
to perform nearly as well relying only on architecture, connectivity, 
and topography. Accordingly, we trained the classifier again on the 
210P dataset, but omitted the task {MRI-based feature maps. When 
trained this way, the classifier indeed performed nearly as well as 
when all features were used, detecting 97.6% of areas in 210P (versus 
98.0% using all features) and 96.4% of areas in 210V (versus 96.6% 
using all features). Hence, we anticipate that the areal classifier will 
generalize to other studies that acquire the following core set of MRI 
images: high-resolution T1w and T2w; spin echo-based b0 field map; 
and extensive fMRI data acquired using ‘multiband’ pulse sequences to 
improve spatial and temporal resolution’ (see Supplementary Results 
and Discussion 2.3 and 2.9). These are the same image acquisition 
requirements as the HCP’s minimal preprocessing pipelines® and the 
MSMaAll areal feature-based registration pipeline'* (Supplementary 
Methods 2.4). Future studies adhering to these image acquisition guide- 
lines will be able to use the unified framework of the HCP’s analysis 
pipelines to automatically generate individualized parcellated analy- 
ses from unprocessed MRI images, a major advance over traditional 
neuroimaging methods that have often relied on comparisons with 
Brodmann’ hand drawn parcellation published in 1909 (ref. 1). 


Discussion 
We have produced a population-based 180-area per hemisphere human 
cortical parcellation using exceptionally high quality multimodal data 
from hundreds of Human Connectome Project subjects aligned using 
an improved areal feature-based cross-subject alignment method 
(MSMAI)). Inspired by an observer-independent post-mortem architec- 
tural parcellation approach!’, we developed a semi-automated neuro- 
anatomical approach adapted to non-invasively acquired multi-modal 
MRI data. Although algorithms determined the final areal borders, 
the multi-modal data were carefully interpreted by neuroanatomists, 
the properties of each cortical area were documented, and each area 
was named in relation to the extant neuroanatomical literature (see 
Supplementary Neuroanatomical Results). A cross-validation showed 
that the areas forming the parcellation were robustly and statistically 
significantly different from their neighbours across multiple modalities. 
We identify this parcellation as HCP-MMP1.0 (Human Connectome 
Project Multi-Modal Parcellation version 1.0), making the version 1.0 
designation because we anticipate future refinements as better data 
become available (see Supplementary Results and Discussion 2.1). 
Unexpectedly, we discovered that despite improved intersubject 
alignment, some areas have atypical topological arrangements in 
some subjects, which we demonstrated for areas 55b, FEF, and PEF. 
We developed a fully automated method for parcellating individual 
subjects based on a machine learning classifier that can cope with 
this kind of individual variability. The areal classifier detected 96.6% 
of individual subject cortical areas in new subjects, including atypical 
areas, and replicated the group parcellation in an independent sample. 
Though we made extensive use of the HCP’s specialized task fMRI 
battery when generating the parcellation, we showed that task {MRI 
data is not essential for future studies aiming to use the areal classifier 
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to automatically define the cortical areas in their subjects. Instead, it 
suffices to acquire the same core set of MRI images needed for the rest 
of the HCP’s software pipelines. 

By generating a robust neuroanatomical map of human neocortical 
areas—a century-old aim of neuroscience—and providing methods 
for mapping these areas in any individual undergoing study with non- 
invasive neuroimaging, the present work represents a major advance 
relative to previous human cortical parcellations. The overall approach 
described here shows that we can produce sharp, reproducible brain 
images across multiple non-invasive neuroimaging modalities. We can 
generate a highly reproducible and generalizable cortical parcellation 
through state-of-the-art methods of data acquisition, preprocessing, 
and analysis designed to compensate for individual variability and 
thereby minimize blurring of images. These improvements, together 
with the new parcellation, make it desirable to use spatial localiza- 
tion methods that move beyond the traditional use of stereotaxic 
coordinates combined with Brodmann areal assignments to charac- 
terize centers of cortical activation in fMRI studies. From a neuro- 
anatomical perspective, there has often been substantial uncertainty 
whether any two neuroimaging studies have found results in the same 
cortical areas or not. The situation is analogous to astronomy in which 
ground-based telescopes produced relatively blurry images of the sky 
before the advent of adaptive optics and space telescopes. 

Many topics are discussed further in the Supplementary Results 
and Discussion 2.1-2.10 (for example, avenues for improving the 
parcellation and other issues left for future work, further discussion of 
the neuroscientific implications of our results, and additional datasets 
that could profitably be linked to our parcellation). As the topographic 
organization of higher cognitive areas becomes better understood, some 
parcels currently considered to be full areas may later be considered 
to be subareas of larger topographically organized cortical areas 
(analogous to somatotopic subregions of topographically organized 
sensory and motor areas illustrated in Supplementary Neuroanatomical 
Results 6). Though our use of multiple modalities probably mitigates 
this issue relative to traditional uni-modal parcellations, the extent to 
which the human multi-modal cortical parcellation may be revised 
along such lines remains a question for future work using the state 
of the art methods mentioned above (see Supplementary Results and 
Discussion 2.8). 

The MSMAII registration and the areal classifier are or will soon 
be freely available on GitHub; the visualization tool Connectome 
Workbench is on http://humanconnectome.org; and the parcellation, 
data, and scenes for reproducing each of the figures are in the BALSA 
database?!. These tools provide a neuroanatomical foundation, enabling 
the identification of cortical areas when reporting results or thinking 
about and discussing brain organization in relation to studies of human 
cognition, lifespan, and disease. Several additional interesting avenues 
of investigation are now open. The ability to discriminate individual 
differences in the location, size, and topology of cortical areas from dif- 
ferences in their activity or connectivity should facilitate the dissection 
of how each property is related to behaviour and genetic underpinnings, 
for example, in learning disabilities or those with distinctive cognitive 
traits. The ability to non-invasively and automatically delineate cortical 
areas in living subjects may have clinical implications, for example by 
providing neurosurgeons with detailed, individualized maps of the 
brains on which they operate. There are also important implications for 
our understanding of human cortical evolution. The dramatic expansion 
in neocortex along the human lineage occurred mainly in higher cogni- 
tive regions of lateral prefrontal, parietal, and temporal cortices™!3"?"?. 
Comparisons with nonhuman primates, including marmosets and 
macaques (both widely used in invasive studies), and great apes, may 
yield new insights regarding the emergence of new cortical areas and 
the divergences in areal functions, which collectively led to the cognitive 
capabilities that make us uniquely human as a species and as individuals. 

Note added in proof: A related paper on the neuroimaging approach 
used by the Human Connectome Project may be found in ref. 34. 
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In addition, we note that FreeSurfer uses an algorithm to label gyri and 
sulci automatically in individual subjects based on manually generated 
training labels*° that is similar in spirit to our areal classifier. Also, 
the FreeSurfer surface modelling noted in the Methods draws from 
methods summarized in ref. 36. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Subjects and acquisition. A total of 449 young adult twins and non-twin siblings 
(ages 22-35) from the Human Connectome Project (HCP) were scanned according 
to the HCP’s acquisition protocol>-’. The MRI acquisition included collecting 
Tlw and T2w structural images, task-based and resting state-based {MRI images, 
diffusion-weighted images, and b0 field maps. Images were acquired at high spatial 
and temporal resolution on a customized Siemens 3 tesla (3T) scanner and with 
customized slice accelerated sequences for fMRI (see Supplementary Methods 
1.1-1.2). All subjects from the HCP 500-subject data release (July, 2014) having 
complete fMRI sessions were included. They were divided into two independent 
groups of 210 subjects that shared no family members between them, together 
with a remaining group of 29 test (29T) subjects that shared family members with 
210P but not 210V. The first group of subjects (210P, 130 females, 80 males) was 
used for creating the parcellation and training the areal classifier, which also made 
use of the 29T group to avoid overfitting. The second group of subjects (210V, 116 
females, 94 males) was used only for statistical cross-validation of the parcellation, 
areal classifier detection rates in independent subjects, and group parcellation 
reproducibility measures. A test-retest group of 27 subjects scanned twice through 
the entire MRI protocol and independently processed through the HCP pipelines 
was used for individual subject reproducibility measures. Subject recruitment 
procedures and informed consent forms, including consent to share de-identified 
data, were approved by the Washington University institutional review board. 
Datasets were de-identified and are publicly shared on the ConnectomeDB 
database (https://db.humanconnectome.org). 

Image preprocessing. Spatial image preprocessing (distortion correction and 
image alignment) was carried out using the HCP’s spatial minimal preprocessing 
pipelines®. These pipelines maximize alignment across image modalities, mini- 
mize distortions relative to the subject’s anatomical space, and minimize spatial 
smoothing (blurring) of the data. The data were projected into the 2mm standard 
CIFTI grayordinates space, which includes cortical grey matter surface vertices 
and subcortical grey matter voxels’. This offers substantial improvements in spa- 
tial localization over traditional volume-based analyses, enabling more accurate 
cross-subject and cross-study registrations and avoiding smoothing that mixes 
signals across differing tissue types or between nearby cortical folds. Additionally, 
we did minimal smoothing within the CIFTI grayordinates space to avoid mixing 
across areal borders prior to parcellation. 

For cross-subject registration of the cerebral cortex, we used a two-stage process 
based on the multimodal surface matching (MSM) algorithm! (see Supplementary 
Methods 2.1-2.5). An initial ‘gentle’ stage, constrained only by cortical folding 
patterns (FreeSurfer’s ‘sulc’ measure), was used to obtain approximate geographic 
alignment without overfitting the registration to folding patterns, which are not 
strongly correlated with cortical areas in many regions. Previously, we found that 
more aggressive folding-based registration (either MSM-based or FreeSurfer- 
based) slightly decreased cross-subject task-fMRI statistics, suggesting that 
aligning cortical folds too tightly actually reduces alignment of cortical areas!*. A 
second, more aggressive stage used cortical areal features to bring areas into better 
alignment across subjects while avoiding neurobiologically implausible distortions 
or overfitting to noise in the data. The areal features used were myelin maps, resting 
state network maps computed with weighted regression (an improvement over 
dual regression*”’ described in the Supplementary Methods 2.3) and resting state 
visuotopic maps (see Supplementary Methods 4.4). Areal distortion was measured 
by taking the log base-2 of the ratio of the registered spherical surface tile areas to 
the original spherical surface tile areas. The mean (across space) of the absolute 
value of the areal distortion averaged across subjects from both registration 
stages was 30% less than the standard FreeSurfer folding-based registration and 
the maximum (across space) of this measure was 54% less. Despite less overall 
distortion, the areal-feature-based registration delivers substantially more accurate 
registration of cortical areas than does FreeSurfer folding-based registration as 
judged by cross-subject task {MRI statistics, an areal feature that was not used 
to drive the registration'*. Because MSM registration preserves topology and is 
relatively gentle (it does not tear or distort the cortical surface in neurobiologi- 
cally implausible ways), it is unable to align some cortical areas in some subjects 
where the areal arrangement differs from the group average (see Supplementary 
Results and Discussion 1.3-1.4 for more details on atypical areas). Group average 
registration drift away from the gentle folding-based geographic alignment was 
removed from the surface registration*® (see Supplementary Methods 2.5) to enable 
comparisons of this dataset with datasets registered using different areal features 
(for example, retinotopically defined areas). Group average registration drift is any 
consistent effect of the registration during template generation on the mean size, 
shape, or position of areas on the sphere (as opposed to the desired reductions in 
cross-subject variation). An obvious example is the 37% increase in average brain 
volume produced by registration to MNI space’. Uncorrected drifts during surface 
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template generation can cause apparent changes in cortical areal size, shape, and 
position when comparing across studies. 

Resting state fMRI data were denoised for spatially specific temporal artefacts 
(for example, subject movement, cardiac pulsation, and scanner artefacts) using 
the ICA+FIX approach, which includes detrending the data and aggressively 
regressing out 24 movement parameters*””. We avoided regressing out the ‘global 
signal’ (mean grey-matter time course) from our data because preliminary analyses 
showed that this step shifted putative connectivity-based areal boundaries so that 
they lined up less well with other modalities, likely because of the strong areal 
specificity of the residual global signal after ICA+FIX clean up. Task fMRI data 
were temporally filtered using a high pass filter. More details on resting state and 
task {MRI temporal preprocessing are described in the Supplementary Methods 
1.6-1.8. Substantial spatial smoothing was avoided for both datasets, and all images 
were intensity normalized to account for the receive coil sensitivity field. Artefact 
maps of large vein effects, fMRI gradient echo signal loss, and surface curvature 
were computed as described in Supplementary Methods 1.9. 

Modalities for parcellation. The multi-modal cortical parcellation used informa- 
tion related to the four areal properties of architecture, function, connectivity, and 
topography”. Architecture was measured using T1w/T2w myelin content maps plus 
cortical thickness maps with surface curvature regressed out®”"® (Supplementary 
Methods 1.5). Function was measured using task-fMRI responses to 7 tasks in 
86 task contrasts (47 unique; 39 were sign-reversed contrasts). Effect size maps 
(beta maps) after correction for the receive field were used instead of Z statistic 
maps because we were interested in regional differences in the magnitude of 
the BOLD (blood oxygen level dependent) signal change induced by the tasks, 
rather than differences in the significance of the BOLD signal change. Functional 
connectivity was measured using pairwise Pearson correlation of the denoised 
resting state time series of each pair of grayordinates. Topographic organization 
was explored using resting state time series in visual cortex, with spatial regressors 
representing polar angle and eccentricity patterns in area V1 combined with a 
modified ‘dual-regression-like’ approach that weights each surface vertex according 
to the cortical surface area that it represents (see Supplementary Methods 4.4). The 
semi-automated multi-modal parcellation was generated using group average data 
for all of these modalities from the 210P group of subjects (see Supplementary 
Methods 3.1-3.3 for details on how the group averages were created for each 
modality). The reproducibility of these group average maps was assessed by 
correlating the spatial maps for the 210P and 210V groups (see Supplementary 
Results and Discussion 1.1). 

The gradient-based parcellation approach. Classically, cortical areas have 
been defined based on sharp changes in one or more of the areal properties of 
architecture, function, connectivity, and topography. Traditionally, this relied 
heavily on visual inspection, until more objective and quantitative approaches 
became available!”!°, One highly successful approach to post-mortem architectural 
parcellation involves computing a dissimilarity metric, (the Mahalanobis distance) 
between neighbouring feature profiles generated from segmented histological 
images and testing for statistically significant and large spikes in dissimilarity that 
indicate putative areal boundaries. For in vivo data, a similarly powerful approach 
involves taking the first derivative (the spatial gradient) of a measure of interest 
along cortical surface and using the gradient magnitude to objectively identify 
locations where the measure is changing rapidly. One can then draw putative areal 
boundaries along the resulting gradient ridges!”'®'°. Here we combined elements 
of both approaches in a multi-modal context to generate semi-automatically 
drawn areal borders that were then evaluated statistically. Gradients were 
computed for architectural, functional, connectivity, and topographic modalities 
(see Supplementary Methods 4.1-4.4). 

To incorporate expert knowledge and priors from the neuroanatomical 
literature into the parcellation process, the neuroanatomists (authors M.E.G and 
D.C.V.E.) evaluated the multi-modal neuroimaging data and its gradients to define 
initial areal borders based on the following criteria. (1) Presence of a co-localized 
gradient ridge in at least two independent modalities was taken as strong evidence 
of an areal border, and the vast majority of areal borders satisfied this criterion. 
(2) Presence of corresponding gradients in the left and right hemispheres provided 
further evidence for a genuine areal border. For the vast majority of borders, 
the same modalities yielded robust gradients in both hemispheres. We did not 
find strong evidence for an area present in one hemisphere that was absent in 
the other (though a few areas show hemispheric asymmetries in their functional 
‘signature’ and/or in their spatial relationships with neighbouring areas). (3) We 
ignored gradients clearly attributable to imaging artefacts (see Supplementary 
Neuroanatomical Results for details). (4) Cortex on opposite sides of the border 
needed to differ robustly and significantly in the areal features used to delineate the 
border. (5) Confidence was increased if prior literature described a corresponding 
areal border. (6) Early runs of a supervised machine learning algorithm (see the 
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cortical areal classifier section of the Methods below) needed to be able to learn to 
distinguish each cortical area from its neighbours in a large majority of individual 
subjects based on individual subject multi-modal features (the early runs were only 
done using 210P and 29T, keeping 210V independent for later analyses). After the 
neuroanatomists delineated the initial areal borders and chose the important areal 
features that defined them, an automated algorithm then optimized the border 
placement so that it followed the most probable path based on the chosen areal 
features (see Supplementary Methods 5.1-5.3). The Supplementary 
Neuroanatomical Results documents the information that was used to distinguish 
each of the 180 areas from its neighbours. 

The neuroanatomists named areas based on previous parcellations whenever 

a reasonable match to the literature could be made. In some cases, areal identi- 
fication was based on the similarity of the area's properties relative to previously 
reported areas (for example, area 4, primary motor cortex, is known to be heavily 
myelinated and thick; area V2 has a mirror-image visuotopic map relative to 
neighbouring area V1). In most cases, however, the information used to describe 
previous cortical areas (for example, cytoarchitecture) was not available in the HCP 
data, and areal identification mainly reflected spatial correspondences relative to 
cortical folding patterns (if reliable for that region of cortex) or spatial relationships 
between neighbouring cortical areas. The strongest evidence for areal identifica- 
tion came from studies that provided surface-based probabilistic or maximum 
probability maps, ideally also registered using areal features and dedrifting of 
templates**. In these cases, we directly compared these data with our data and 
show the degree of overlap in the Supplementary Neuroanatomical Results. When 
such data were unavailable, we used published information to the degree feasible 
(see Supplementary Methods 5.3 for limitations of non-surface-based/not publicly 
available data) to make areal identifications or to describe new areas that had not 
previously been identified. The information used to name each cortical area is 
described in the Supplementary Neuroanatomical Results. 
Statistical cross validation of the multi-modal parcellation. Once the parcellation 
has been created, parcellated representations of data from each modality can be 
generated using either the group parcellation or the individual subject parcellations. 
For the statistical cross-validation, we created parcellated myelin, cortical thickness, 
task f{MRI, and resting state functional connectivity datasets using the semi- 
automated multimodal group parcellation (see Supplementary Methods 7.1). For 
myelin and cortical thickness, we simply averaged the values of the dense individual 
subject maps within each area. For task {MRI, we averaged the time series within 
each area prior to computing task statistics (to benefit from the SNR improvements 
of parcellation demonstrated in Fig. 4e). For the same reason, we averaged resting 
state time series within each parcel prior to computing functional connectivity to 
form a parcellated functional connectome. 

For each pair of areas that shared a border in the parcellation, we computed a 
paired samples two-tailed t-test across subjects on these parcellated data for each 
feature (ignoring tests that involved the diagonal in the resting state parcellated 
functional connectome). We thresholded these tests at the Bonferroni-corrected 
significance level of P< 9 x 10 * (number of area pairs across both hemispheres 
(1,050) x number of features (266) x number of tails (2) x 0.05) and an effect size 
threshold of Cohen's d> 1. We grouped the features into 4 independent catego- 
ries (cortical thickness, myelin, task f{MRI, and resting state fMRI) to determine 
for each area pair whether it showed robust and statistically significant 
differences across multiple modalities. For more details, see Supplementary 
Methods 7.2. 

The cortical areal classifier. We used a supervised machine learning classifier to 
automatically delineate and identify each cortical area from its neighbours across 
a large majority of individual subjects based on multi-modal information. Besides 
validating the robustness of the parcellation, this provides useful information about 
each individual subject’s parcellation, along with an approach to generalizing the 
parcellation to other datasets. To automatically parcellate individual subjects, we 
adapted the multi-layer perceptron used by ref. 29 to delineate and identify seven 
resting state networks more accurately than simpler linear methods including 
dual regression. We used the multi-layer perceptron to classify all 180 areas in our 
parcellation using multi-modal feature maps and relied on two neuroanatomically 
sensible assumptions to simplify the problem. (1) After areal feature-based registra- 
tion (MSMAII), we assumed that each cortical area was approximately in the same 
general location across subjects (for example, we don’t expect to find V1 outside 
the occipital lobe). This also means that we consider widely separated regions 
having similar multi-modal areal fingerprints to be distinct cortical areas even if 
they have similar architecture, coactivation in functional tasks, and belong to the 
same resting state network. These assumptions allowed us to reduce the overall 
classification problem to a set of 180 classification problems per hemisphere, each 
involving discrimination of one area from the areas around it. (2) Also, instead 
of classifying each area from all of its neighbours specifically (one class for the 


area plus one class for each neighbouring area), we set up the problem as a binary 
classification (the most robust kind of classification problem), classifying each 
area from all of the surrounding cortex as a single alternate class. This surrounding 
cortex represents a ‘searchlight’ for the area, and this searchlight was the group 
parcel location plus a 30 mm radius surrounding the group parcel in all directions 
across the surface (meaning that for a 10 mm circular area, the searchlight would be 
a circle of 70mm in diameter, still a quite large region of cortex). The 30 mm radius 
(geodesic distance computed on the group average mid-thickness surface corrected 
for vertex area loss due to averaging) was chosen because it easily encompassed 
the individual variation in area 55b in the 210P group (55b approaches a worst 
case because it is a relatively small and highly variable cortical area). The training 
labels were the group area from the semi-automated parcellation (class 1), and the 
remaining cortex in the searchlight (class 2). 

The features used by the classifier covered the same set of modalities used for 
the original parcellation, including architectural measures of myelin and cortical 
thickness with curvature regressed out; task fMRI maps (redundant information 
was reduced and SNR increased with a d= 20 ICA-decomposition run on the 
task contrast beta maps, see Supplementary Methods 6.4); the 77 surface-related 
resting state {MRI network maps computed on individual subjects using weighted 
regression from an overall d= 137 group ICA; five visuotopic topographic maps 
transformed into a format interpretable by the classifier; and maps of artefacts 
that the classifier used to interpret differences in areal features due to artefac- 
tual effects (see Supplementary Methods 6.3-6.5 for further description of each 
modality’s classifier features). These 112 multi-modal feature maps were gener- 
ated for each vertex in each of the 449 subjects and the 27 repeated subjects, with 
each hemisphere processed separately. Other than the 30 mm radius searchlight 
region of interest (ROI), the classifier has no spatial concept of where the area 
should be (it operates independently on each vertex and only knows what the 
area’s fingerprint looks like in the feature space). Consequently, special consid- 
eration was given to the spatial visuotopic patterns, which were transformed into 
maps whose values reflected the alternating mirror symmetric organization of 
visual areas (that is, maps whose values reflect the orientation of the visuotopic 
gradient vector relative to the vector that points ‘geodesically’ towards V1, see 
Supplementary Methods 6.5). 

The classifier analyses were conducted using a standard machine learning train/ 
test/validation approach. The classifier was trained using the 210P subjects and 
tested against overfitting using the extra 29T subjects. The 210V subjects were 
used as the validation sample, and thus were not involved in the classifier training, 
testing, or the parcellation itself, and also shared no family relationships with the 
210P or the 29T groups. A short initial run of the classifier was used to identify 
features that the classifier was particularly sensitive to for each area (see below and 
Supplementary Methods 6.6). These features were compared in each individual 
subject with the group average pattern to exclude subjects that were potentially 
misaligned with the typical subject in this region (and hence for which the group 
defined training labels were likely inaccurate). This area-specific set of subjects 
in the 210P and 29T groups were excluded from the final classifier training of 
each area. The classifier’s output (ranging from 0 to 1) represents the likelihood 
that a given vertex in a subject is part of the area being classified or part of the 
surrounding cortex of the searchlight. Once the classifier training weights have 
been generated, it is possible to classify any subject who has the 112 multi-modal 
maps computed, including those whose areas are misaligned with the group 
(see Supplementary Results and Discussion 1.4). 

The trained classifier was applied to the 449 subjects and 27 repeat subjects 
to generate individual subject likelihood maps for each of the 180 areas in each 
hemisphere. These probability maps were combined by finding the largest 
probability for each vertex and then regularized within local neighbourhoods 
(see Supplementary Methods 6.7) to make an individual subject ‘winner-take- 
all parcellation. An area was considered to have been detected in a subject for 
the purposes of the areal detection measures (the overall classifier areal detection 
rate and the maps of areal detection rate for each area) if its size was between 
1/3 and 3 times the size of the original population-based parcel (a pragmatic 
threshold chosen prior to performing the analysis that tolerates modestly greater 
neuroanatomical variability across subjects than the empirical range reported in 
cytoarchitectonic studies*!”). Probabilistic maps of each area were then created 
by separately averaging the individual subject winner-take-all parcellation areas 
for the 210P and 210V subject groups. A group maximum probability map (MPM) 
parcellation was then created by assigning the identity of the maximum areal 
probability to each vertex. The reproducibility of the parcellation was assessed 
by correlating these two MPM maps and by computing a Dice coefficient. 
In both cases the parcellation was first turned into 180 concatenated binary 
ROIs per hemisphere (each area was represented by a separate map, ~30,000 
vertices per hemisphere, with ones for all vertices inside the area and zeros 
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for all vertices outside). The reproducibility of the individual subject hard 
parcellation maps was assessed similarly. For more details, see Supplementary 
Methods 7.2. 

Multi-modal areal fingerprints learned by the classifier were visualized using 
a classifier sensitivity metric. This metric was the partial derivative with respect 
to each feature of each area multiplied by the gradient magnitude of the feature 
(see Supplementary Methods 6.8). The measure indicates which areal features 
the classifier finds most informative when classifying a given area and whether 
increases or decreases in the value of the feature make the area more likely to be 
present. The sensitivity metric can be visualized both at the dense (vertex-wise) 
level for each feature and each area, or summarized at a parcel level. For each 
feature, the sensitivity metric was summarized at the parcel level by taking the 
maximum absolute value of the metric (finding the border where the feature was 
most influential) and using this maximum to represent the area in a parcellated 
or a matrix view, as shown in Supplementary Fig. 12 of Supplementary Results 
and Discussion 1.6. 
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SARII1 bacteria linked to ocean anoxia 


and nitrogen loss 
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Neha Sarode?, Rex R. Malmstrom%, Cory C. Padilla, Benjamin K. Stone*, Laura A. Bristow°, Morten Larsen®, Jennifer B. Glass’, 
Bo Thamdrup®, Tanja Woyke?, Konstantinos T. Konstantinidis!? & Frank J. Stewart? 


Bacteria of the SAR11 clade constitute up to one half of all microbial cells in the oxygen-rich surface ocean. SAR11 bacteria 
are also abundant in oxygen minimum zones (OMZs), where oxygen falls below detection and anaerobic microbes have 
vital roles in converting bioavailable nitrogen to N2 gas. Anaerobic metabolism has not yet been observed in SAR11, and 
it remains unknown how these bacteria contribute to OMZ biogeochemical cycling. Here, genomic analysis of single 
cells from the world’s largest OMZ revealed previously uncharacterized SAR11 lineages with adaptations for life without 
oxygen, including genes for respiratory nitrate reductases (Nar). SAR11 nar genes were experimentally verified to encode 
proteins catalysing the nitrite-producing first step of denitrification and constituted ~40% of OMZ nar transcripts, with 
transcription peaking in the anoxic zone of maximum nitrate reduction activity. These results link SAR11 to pathways of 
ocean nitrogen loss, redefining the ecological niche of Earth’s most abundant organismal group. 


Alphaproteobacteria of the SAR11 clade form one of the most ecologi- 
cally dominant organism groups on the planet, representing up to half 
of the total microbial community in the oxygen-rich surface ocean!®. 
All characterized SAR11 isolates, including the globally ubiquitous 
Candidatus Pelagibacter genus, are aerobic heterotrophs adapted 
for scavenging dissolved organic carbon and nutrients under the 
oligotrophic conditions of the open ocean®°. Gene-based surveys 
have also revealed diverse SAR11 lineages at high abundance in the 
deep waters of the meso- and bathypelagic realms'"'*. However, the 
functional properties that distinguish SAR11 bacteria living in distinct 
ocean regions remain unclear. All known SAR11 genomes are small 
(typically less than 1.5 megabase pairs (Mb)), with genomic stream- 
lining as a potential adaptation to the nutrient-limiting conditions of 
the open ocean!!. It has been hypothesized that adaptations in SAR11 
do not involve large variations in gene content®*, suggesting that the 
contribution of SAR11 to ocean biogeochemistry is primarily through 
its role in aerobic oxidation of organic carbon. 

Although genetic or biochemical evidence of anaerobic metabo- 
lism has not been reported for SAR11, high abundances of SAR11- 
related genes have been detected under anoxic conditions in marine 
OMZs. Permanent OMZs extend over ~8% of the oceanic surface area 
(oxygen (Oz) <20,1M)!4, with the largest and most intense OMZs in 
upwelling regions of the Eastern Pacific. In the cores of these regions, 
microbial respiration of high surface primary production combines 
with low ventilation to deplete O2 from mid-water depths, resulting 
in O, concentrations below detection (~10nM) over a major portion 
(~100-700m) of the water column”. In the absence of Oy, respiratory 
nitrate (NO3_) reduction to nitrite (NO2 ) becomes the dominant 
process for organic matter oxidation!®, with respiratory Nar proteins 
being among the most abundant and highly expressed enzymes 
in OMZs!7-!°. NO3~ respiration results in a substantial accumula- 
tion of NO.” in OMZs, often to micromolar concentrations”°. This 
NO,° pool is actively cycled through NO” -consuming microbial 


metabolisms, notably the anaerobic processes of denitrification and 
anaerobic ammonium oxidation (anammox)?!”’, which together in 
OMZs account for 30-50% of the loss of bioavailable nitrogen from 
the ocean as either gaseous dinitrogen (Nz) or nitrous oxide (N20)???, 
Surprisingly, SAR11 bacteria are often the most abundant organisms 
in the NO, -enriched N-loss zone of OMZs, where O3 is undetectable, 
representing ~20% (range: 10-40%) of all 16S ribosomal RNA genes 
and protein-coding metagenome sequences in the 0.2-1.6 1m biomass 
fraction'®!>34, Such high abundances imply that SAR11 make up a 
substantial fraction of the OMZ community and raise the question of 
the role of SAR11 in OMZ biogeochemistry. 

We analysed single amplified genomes (SAGs) to identify the 
metabolic basis for the dominance of SAR11 in anoxic OMZs. We 
focused on SAR11 SAGs obtained from the Eastern Tropical North 
Pacific (ETNP) OMZ off Mexico, the world’s largest OMZ, accounting 
for 41% of global OMZ surface area'* (Fig. 1a). O2 concentration 
at this site declined from ~200\1M at the surface to ~400 nM at 
the bottom of the oxycline (30-85 m) and was typically at or below 
the detection limit (~10nM) from ~90 m to 700m. At the time of 
sample collection, NO3 reduction rates increased with depth into the 
OMZ, peaking at ~9.5nM N d-! at 300 m (ref. 19), paralleling 
an increase in the abundance of sequences encoding Nar-type 
NO3~ reductases in coupled metagenomes and metatranscriptomes 
(Fig. 1c). In contrast, aerobic NO oxidation peaked at 100m 
(260 nM N d7!), where trace O2 was available and NO~ was abun- 
dant, before declining 20-fold with depth into the OMZ (Fig. 1c). 
However, NO,” oxidation rates are probably overestimated due to slight 
O, contamination in incubations”. These data highlight a transition 
to anoxia within the ETNP OMZ!>!’, with in situ O, concentration 
at least an order of magnitude lower than the inhibitory threshold 
for NO3~ reduction, denitrification and anammox?>”°, consistent 
with micromolar accumulations of NO from NO3 reduction in 
this zone. 
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Figure 1 | Site description and phylogenetic 
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Diverse SARI] SAGs from anoxic waters 

Samples for SAG analysis were obtained from two depths in the 
anoxic zone: at 125m at the NO, maximum (611M), and at 300m 
in the core of the NO3" reduction zone. Single prokaryotic cells were 
isolated by fluorescence-activated cell sorting, subjected to genome 
amplification’, and screened by 16S rRNA gene fragment (470 bp) 
polymerase chain reaction (PCR) and Sanger sequencing. From this 
screen, 23% and 32% of SAGs from 125 m and 300 m, respectively, were 
confidently assigned to the SAR11 family Pelagibacteraceae (Fig. 1b), 
thus confirming the substantial numerical abundance of SAR11 in the 
OMZ. From this SAR11 subset, 10 SAGs from 125m and 12 SAGs from 
300m were randomly selected for shotgun sequencing (Illumina), along 
with 5 technical control SAR11 SAGs from the oxic surface waters of 
the Gulf of Mexico (GoM). After sequencing, quality filtering and 
assembly, a total of 19 SAGs were used for analysis: 15 OMZ SAGs 
(5 from 125m, 10 from 300m) and 4 GoM control SAGs 
(Supplementary Table 1). These genomes exhibited varying 
levels of completeness (~2-90%; average 30%) and no detectable 
contamination (Extended Data Fig. 1), as assessed by the presence of 
single-copy housekeeping genes”*””, 16S rRNA gene identities, and the 
taxonomic assignment of SAG contigs (Supplementary Tables 1, 2 and 
Supplementary Discussion). 

The identified SAGs represented a diverse and novel SAR11 
community in the OMZ. Phylogenetic reconstructions based on 
either 16S rRNA genes or single-copy housekeeping proteins placed 
the 19 SAGs in 5 subclades of SAR11 (Fig. 2a). Average amino acid 
identity (AAI) comparisons among all available SAR11 genomes 
(Supplementary Table 3) further corroborated this classification, 
placing: (1) seven OMZ SAGs within the previously uncharacter- 
ized deep-branching monophyletic group of subclade Ila (hereafter 
designated subclade Ha.A), distinct (>5% 16S divergence) from 
SAG HIMB058 from the tropical North Pacific (hereafter designated 
subclade Ila.B); (2) three OMZ SAGs within the deep-branching 
subclade IIb; (3) two OMZ SAGs within subclade Ic, which includes 
recently described SAGs from the bathypelagic ocean®; (4) two OMZ 
and all four GoM surface SAGs within subclade Ib, which thus far lacks 
genome representatives; and (5) OMZ SAG A7 as most closely related to 
HIMB59, a member of the divergent SAR11 subclade V83031 Note that 
the exact placement of subclade V in the SARI1 phylogeny is unstable 
depending on the marker gene and outgroup used*””?. The average 
estimated genome size of OMZ SAGs was 1.33 Mb (Supplementary 
Table 1), consistent with prior reports of genome streamlining in 
SARI1. 


OMZ SARI abundance peaks under oxygen depletion 

To estimate the in situ abundance and activity of OMZ SAR11, 
metagenome and metatranscriptome reads from OMZ sites and from 
diverse oxic ocean regions (Supplementary Table 4) were recruited to 
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39 available SAR11 genomes (Supplementary Table 1). Metagenomic 
read recruitment, performed essentially as described previously*4, 
showed that each OMZ SAR11 subclade represents a sequence-discrete 
(and hence tractable) population (Supplementary Discussion), but 
with each population encompassing substantial intra-population 
variation (~92-100% average nucleotide identity between members 
of the population versus <90% between populations), as well as gene 
content variability (Extended Data Fig. 2). We therefore estimated 
SAR11 abundance at the subclade level, based on the average coverage 
of 507 genes shared between genomes from all SAR11 subclades. On 
the basis of this analysis, SAR11 subclades Ic, [a.A and IIb together 
comprised about 10-30% of the bacterial community in ETNP and 
ETSP metagenomes and metatranscriptomes from depths with 
undetectable O» (Fig. 2b, c), consistent with the high abundance of 
SAR11 in the pool of cells sorted for SAG analysis (Fig. 1b). Subclade 
Ila.A, composed exclusively of seven SAGs from this study, was particu- 
larly abundant, making up to 15% of the community in anoxic samples. 
All OMZ subclades were absent from or much less abundant (<5%) in 
metagenomes from oxic sites, including those from above the ETNP 
OMZ (Fig. 2b). Together, these results identify newly described SAR11 
subclades whose distribution is linked to an oxygen-depleted niche. 


Metabolic adaptations to low oxygen in SARI genomes 

OMZ and GoM SAGs were then analysed for evidence of microaero- 
bic or anaerobic metabolism. Surprisingly, in 8 of the 15 OMZ SAGs, 
belonging to SAR11 subclades Ic, Ila.A, IIb and V, protein family-based 
classification detected genes encoding the respiratory Nar of the DMSO 
reductase superfamily (Fig. 2a). Evidence of a complete canonical 
nar operon (narGHJI)—encoding the a subunit that catalyses NO3~ 
reduction to NO,” (NarG), the iron-sulfur-containing 3 subunit 
(NarH) that transfers electrons to the molybdenum cofactor of NarG, 
the transmembrane cytochrome b-like 1 subunit (NarI) involved 
in electron transfer from membrane quinols to NarH, and the Nar] 
chaperone involved in enzyme formation—was found within a single 
assembled contig in four SAGs (A6, E4, D9, A7), while partial narG and 
narH fragments were identified in another four SAGs (Extended Data 
Fig. 3). In all SAR11 SAGs containing nar on a contig, we identified 
other genes upstream or downstream on the same contig taxonomi- 
cally assigned to SAR11 reference genomes (Supplementary Table 5 and 
Supplementary Discussion), further confirming the association of nar 
with SAR11. Genes encoding the NO3~/ NO,” transporter NarK and 
proteins for biosynthesis of the essential molybdenum cofactor (moeA, 
mobA) were also identified in eight and five of the SAGs, respectively 
(Supplementary Table 1). In only four of the fifteen OMZ SAGs were 
nar or cofactor synthesis genes not detected, presumably due to 
sequencing gaps (completeness of these SAGs: 4-20%; Supplementary 
Table 1). In contrast, these genes were not detected in any of the four 
control SAGs from the oxic GoM, despite high completeness of those 
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Figure 2 | Diversity, abundance and transcription of nitrate-reducing 
SAR11. a, Maximum likelihood phylogeny based on the concatenated 
alignment of single copy housekeeping (left) and 16S rRNA (right) genes 
in SAGs from this study, SAR11 and representative alphaproteobacterial 
genomes. Values in parentheses denote the number of housekeeping genes 
used per genome. For the 16S-based tree, only full-length sequences from 
the genomes in the left tree were included. Star symbols of the same colour 
represent closely related narG genes (>97% amino acid identity), encoding 
the catalytic subunit of the respiratory nitrate reductase of the DMSO 


genomes (average 61%). Genes encoding for downstream steps of 
denitrification or other dissimilatory anaerobic metabolisms were 
not found in any of the SAGs. However, in contrast to all previously 
analysed SAR11 genomes, three of the OMZ SAGs, all from subclade 
Ila.A, also contained genes encoding high-affinity O2-using bd-type 
terminal oxidases (Supplementary Table 1). Compared with the coxI- 
type oxidases present in all known SAR11 genomes, including the OMZ 
SAGs analysed here, bd-type oxidases have a much higher affinity for 
O, (3-8 nM; Supplementary Discussion), suggesting a potential for 
microaerobic respiration by OMZ SARI1. These results provide the 
first indication of adaptation to low oxygen in SAR11 and the ability 
to respire NO3_ to NO» in the absence of oxygen, consistent with the 
distribution of these bacteria in the OMZ water column. 


Multiple divergent Nar proteins in OMZ SARI11 

Phylogenetic placement of all identified narG and narH genes and 
partial fragments revealed two divergent nar variants in OMZ 
SAGs (Fig. 3a and Extended Data Fig. 3): (1) an “OP1 type’ in which 
all four nar genes and an upstream cytochrome c protein were 
most similar (56-78% amino acid identity) to homologues from 
‘Candidatus Acetothermus autotrophicum (Supplementary Table 5), 
a putative anaerobic acetogen of the candidate bacterial phylum OP1 
(ref. 35); and (2) a ‘Gamma-type’ variant most similar (51-78% identity) 
to Nar from a denitrifying Gammaproteobacteria endosymbiont 
(Ca. Vesicomyosocius okutanii strain HA)°®. At least two of the OMZ 
SAR11 SAGs from subclade Ila.A, as well as SAG A7 from subclade 
V, encoded both OP1- and Gamma-type nar variants, suggesting that 
divergent nar copies (~42% amino acid identity) co-occur in the same 
genome (Supplementary Discussion). Multiple nar operons per genome 
have been reported for diverse bacteria and are hypothesized to be 
related to adaptation to different oxygen conditions, with one variant 
constitutively expressed at low baseline levels*”*°. For both OP1- and 
Gamma-type variants, the sequence divergence among recovered 
sequences was consistent with the phylogenetic placement of the SAGs. 
For example, OP1-type narG fragments represented three distinct 97% 
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amino acid identity clusters (Fig. 2a). Sequences from clade Ha.A SAGs 
fell within the same cluster, sharing ~96.5% identity with sequences of 
the closely related Ic and IIb subclades, and ~90% with sequences from 
the more distant A7 SAG (Extended Data Fig. 3). This pattern suggests 
diversification of nar operons in parallel with its genomic background, 
and also confirms that these sequences are not a systemic contaminant 
(Supplementary Discussion). 


Biochemical characterization of SARI] Nar 

We sought to characterize further the biochemical function of SAR11 
nar genes. Phylogenetic reconstruction based on 392 proteins of the 
diverse DMSO superfamily revealed that both OP 1- and Gamma-type 
NarG fall within the clade of membrane-bound cytoplasm-oriented 
Nar and NO,” oxidoreductases (Nxr), and were most closely related to 
Nar from known NO; -reducing bacteria (Fig. 3a)“°. The lack of a TAT 
peptide motif at the N terminus corroborated the probable cytoplasmic 
orientation of the NarG active site*!, similar to experimentally verified 
Nar in Escherichia coli*”. Additionally, the identified NarG sequences 
contain diagnostic functional domains found in NarG but not in other 
oxidoreductases of the DMSO reductase superfamily (Extended Data 
Fig. 4)*°, 

To verify NO; reduction potential in SAR11, we introduced full- 
length SAR11 nar operons into a NO3" reductase-deficient E. coli 
mutant and tested for enzyme activity. The Gamma-type nar operon 
was successfully expressed in E. coli, yielding Nar proteins of the 
predicted size range and enabling growth of the mutant under anoxic 
conditions in the presence of NO3_, coupled with simultaneous NO3~ 
reduction to NO)” (Extended Data Fig. 5), thereby providing direct 
evidence for the function of this enzyme in vivo. The OP1-type operon 
did not reverse the E. coli mutant phenotype, presumably due to the 
much greater divergence of this variant from the E. coli nar operon. 
Given the high similarity of Nar and Nxr protein sequences**, and 
the reversibility of the NO3~ reduction reaction, it is possible that either 
or both OP1- and Gamma-type proteins could also function in situ 
to oxidize NO2~ aerobically. Although it is enticing, this possibility is 
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Figure 3 | Diversity, abundance and transcription of Nar enzymes in 
the OMZ. a, Phylogenetic reconstruction of NarG sequences identified 
in the SAR11 SAGs and metagenomic SAR11 contigs (ETNP prefix), 
along with reference Nar and Nxr enzymes. Partial gene sequences 
(represented with coloured pies) were subsequently added to the 
pre-constructed tree with phylogenetic placement. aa, amino acid. 
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encode the enzyme. nt, nucleotide. c, Relative expression of NarG/NxrA 
proteins in the ETNP transcriptomes. 


remote given the experimental and phylogenetic evidence, a positive 
relationship between NO3 reduction rates and the abundance of OP1 
and Gamma-type genes and transcripts in the anoxic OMZ depths 
(Fig. 1c), and prior results showing O) sensitivity of OP1-type nar 
transcription” (Supplementary Discussion). Rather, the results strongly 
suggest that the identified SAR11 narG genes encode functional 
NO3_ reductases. 


SARII nar is abundant and highly transcribed 

We next examined the abundance of SAR11-affiliated nar genes 
within the OMZ to evaluate the contribution of SAR11 cells to NO3~ 
reduction. We first identified nar sequence reads in OMZ metagenomes 
using a similarity search-trained model that discriminates NO3~ 
reductase (or NO2” oxidoreductase) reads from those of other genes 
of the DMSO superfamily (Supplementary Discussion). These narG 
reads were then classified within a reference phylogeny containing 
320 NarG proteins, including OP1- and Gamma-type sequences. 
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Remarkably, the majority of narG reads from OMZ metagenomes were 
classified as OP 1- or Gamma-type Nar enzymes (Fig. 3b and Extended 
Data Fig. 6a), with the two variants accounting for 70% of total narG 
sequences at anoxic depths (Supplementary Table 4). Such high 
representation is consistent with quantitative (q)PCR-based counts 
of OP1- and Gamma-type narG copies at the collection site, where 
the two variants (summed) spiked at the OMZ NO.” maximum at 
>200,000 copies ml“! (Extended Data Fig. 6b). The average number 
of nar genes per cell (that is, genome equivalents) was estimated 
by comparing the abundance of nar sequences with those of rpoB, 
a universal single-copy gene. On the basis of those estimations, 
Gamma and OP! nar variants occur in up to 61% and 85% of OMZ 
bacteria, respectively (Fig. 3b and Extended Data Fig. 6b), assuming 
each nar type occurs once per genome. Such high values are striking 
but consistent with prior results taken from Basic Local Alignment 
Search Tool (BLAST)-based taxonomic assignments'®. These values 
also exceed the estimated SAR11 abundances in the metagenomes, 
or those calculated directly from SAG 16S screening (up to 32% of 
the community), indicating that these gene variants occur in multiple 
copies per genome or in diverse bacteria (Supplementary Discussion). 
Metagenomic evidence suggests that the majority of these nar operons 
are found in SAR11 genomes within the OMZ. First, while our SAG 
collection captured only a fraction of total nar diversity, additional nar 
operons were identified in metagenomic contigs classified as SAR11 
(Extended Data Figs 3, 7 and Supplementary Table 6). Second, the 
majority of the metagenomic narG reads showed >95% nucleotide 
identity with the narG genes encoded by the SAGs, suggesting that 
SAR11 cells are among the major contributors of Nar enzymes in the 
OMZ (Fig. 2b). 

Metatranscriptome sequencing confirmed that SAR11-affiliated 
nar genes are transcribed in the OMZ. The abundance of both OP1- 
and Gamma-type variants in ETNP metatranscriptomes increased 
steadily from the lower oxycline (85 m) to the OMZ core (300m), 
directly paralleling the abundance of the respective genes and the 
depth trend in NO3~ reduction rates (Fig. 1c). Notably, within the 
ETNP OMZ, an average of 39% of all narG transcripts shared >95% 
nucleotide identity with the OP1- or Gamma-type sequences detected 
in SAR11 SAGs (Fig. 3c), a conservative lower-bound estimate of the 
contribution of SAR11 bacteria to the total nar transcripts within the 
OMZ. Accordingly, within the anoxic OMZ depths, nar genes are 
among the most transcriptionally active genes in the SAG genomes 
(Extended Data Fig. 8). The high transcriptional activity of SAR11 
nar operons, interpreted alongside their distribution relative to NO3~ 
reduction rates, suggests that SAR11 bacteria contribute substantially to 
community NO3" respiration. 


Conclusions 

Collectively, our findings identify diverse and abundant SAR11 lineages 
whose genome content and environmental distribution reflect adap- 
tation to an anoxic niche, unlike all other SAR11 bacteria character- 
ized to date. The experimentally verified NO3~ reductase activity in 
the Gamma-type SAR11 nar variant, along with the high expression 
levels of divergent SAR11 nar genes in the functionally anoxic core 
of the OMZ, suggest that persistence in this niche is linked to NO3~ 
respiration, consistent with the fundamental importance of this process 
in OMZs. Nitrate respiration in OMZs constitutes the primary mode 
for organic carbon mineralization and the main production route 
of NO, , a critical substrate for the major nitrogen loss processes 
of anammox and denitrification. The presence and activity of nar 
operons in SAR11, as well as the high abundance of nar-associated 
SAR11 clades in the OMZ, implicate these versatile organisms as 
major contributors to the initiation of OMZ nitrogen loss. Together, 
these findings redefine the ecological niche of one of the planet’s most 
dominant groups of organisms, providing a set of genomic references to 
establish SAR11 as a model for studies of nitrogen and carbon cycling 
in OMZs. 
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METHODS 
Collection of ETNP and GoM samples for SAG analysis. No statistical methods 
were used to predetermine sample size. Selection of the SAR11 SAGs was rand- 
omized. The investigators were not blinded to allocation during experiments and 
outcome assessment. Samples were collected from the ETNP OMZ during the 
Oxygen Minimum Zone Microbial Biogeochemistry Expedition (OMZoMBIiE) 
cruise (R/V Horizon, 13-28 June 2013). Sea water for single-cell sorting and sin- 
gle amplified genome (SAG) analysis was collected from two depths within the 
OMZ (125m and 300m) at station 6 (18° 54.0N, 104° 54.0W) on 19 June (Fig. 1). 
Additional (‘control’) samples were collected from a depth profile (1-2,107 m) of 
the Gulf of Mexico (GoM) on 29 May 2012 aboard the R/V Endeavour (cruise 
EN509) at station 5, with samples for SAG analysis preserved from the oxic sur- 
face (1m). Collections were made using Niskin bottles on a rosette containing a 
conductivity-temperature-depth profiler (Sea-Bird SBE 911plus). Water samples 
were prepared by cryopreservation according to the protocol recommended by the 
Bigelow Single Cell Genomics Center. Briefly, triplicate 1 ml samples of bulk sea 
water (no prefiltration) were gently mixed with 100,11 of a glycerol TE stock solution 
(20 ml 100x TE pH 8.0, 60 ml sterile water, 100 ml glycerol) and frozen at —80°C. 
ETNP OMZ rate measurements, and oxygen and nutrient analysis. Samples 
for oxygen and nutrient measurements were collected on the same date and casts 
as those for single-cell sorting described above. Samples for rate measurements 
and metagenomics/transcriptomics (below) were collected a few hours later on 
the same day. Detailed collection and analysis procedures for those samples have 
been previously described’. Briefly, oxygen concentrations were determined using 
rosette-mounted sensors, including a SBE43 dissolved oxygen sensor for micro- 
molar sensitivity and a high-resolution switchable trace amount oxygen (STOX) 
sensor for nanomolar-level measurements*®. CTD-based oxygen measurements 
(SBE43) from three casts spanning this sampling period revealed no detectable 
movement in the oxycline, indicating stability in water column conditions. 
Metagenome and metatranscriptome samples. Metadata, sequencing statistics, 
and accession numbers of all analysed metagenome and metatranscriptome data 
sets are in Supplementary Table 4. Here, we summarize the OMZ and GoM data 
sets at the core of our analysis. ETNP OMZ metatranscriptomic and metagenomic 
data sets were generated via MiSeq Illumina sequencing as described in ref. 19 
and ref. 47, respectively, for 5 depths at station 6: the upper oxycline (30m), lower 
oxycline (85 m), secondary chlorophyll maximum (100 m), secondary nitrite maxi- 
mum OMZ (125m) and OMZ core (300 m) (Supplementary Table 4). Metagenome 
data sets from the ETSP were generated by Roche 454 pyrosequencing as previ- 
ously described" for 4 depths at an OMZ site (20° 05S, 70° 48W) off the coast of 
Iquique, Chile: the suboxic (<10,1M) upper OMZ just below the oxycline (70 m), 
the anoxic OMZ core (110m, 200m), and the oxic zone below the OMZ (1,000m). 
The ETNP and ETSP data sets analysed here reflect the 0.2-1.6 1m biomass size 
fraction; this fraction was shown to contain the vast majority of bacterioplankton 
and SAR1I cells!*. We also included two additional metagenomes, sampled on 
5 May 2014 from the same site (station 6) in the ETNP, in order to obtain full- 
length nar operons for cloning purposes (see below). These metagenomes were 
obtained from depths of 68 m within the oxycline and 120 m within the OMZ. For 
the 9 GoM metagenomes released with this study, samples were collected from 
Niskin bottles (601 per depth), and filtered on board using the same filtration 
systems as for the ETNP and ETSP metagenomes (0.2-1.6 1m fraction). DNA 
was extracted with the same protocol as for the OMZ samples" and libraries were 
prepared and sequenced in two lanes on an Illumina HiSeq (150 bp paired reads). 
All metagenomic and metatranscriptomic data sets were quality trimmed as 
described below for the SAG data sets. The metatranscriptomic data sets were 
further filtered to remove rRNA transcripts using the SortMeRNA algorithm. 
Four-hundred and fifty-four metagenomic data sets were filtered to remove 
duplicate sequences. The quality trimmed reads from the OMZ metagenomes 
(ETNP and ETSP), were assembled with IDBA*®’ and genes were predicted on 
contigs longer than 500 bp with MetaGeneMark.hmm”°. Taxonomic classifica- 
tion of metagenomic contigs was performed with MyTaxa°!. nar operons were 
identified on metagenomic contigs as described below for the SAG assemblies. 
SAG isolation and taxonomic characterization. Single amplified genomes 
(SAGs) were generated from individual bacterial cells, according to standard 
procedures in the Department of Energy Joint Genome Institute workflow”’. 
Briefly, individual cells sorted on a BD Influx (BD Biosciences) were treated with 
Ready-Lyse lysozyme (Epicentre; 5 U/l final concentration) for 15 min at room 
temperature before the addition of lysis solution. Whole-genome amplification 
was performed with the REPLI-g Single Cell Kit (Qiagen) in 21] reactions set up 
with an Echo acoustic liquid handler (Labcyte). Only the lysis and stop reagents 
from the REPLI-g kit received UV treatment since the amplification cocktail was 
pre-treated by the manufacturer. Amplification reactions were terminated after 6h. 
PCR amplification and Sanger sequencing of a ~470 bp region of the 16S rRNA 
gene (amplified using primers 926wF (5’/-AAACTYAAAKGAATTGRCGG-3’) 


and 1392R (5‘-ACGGGCGGTGTGTRC-3’) for archaea and bacteria was used to 
assign a preliminary taxonomic identification to each of the SAGs, via comparisons 
to the Greengenes rRNA database. 

SAG sequencing. A total of 27 SAR11 classified SAGs identified were randomly 
selected for sequencing, including 10 and 12 SAGs from 125m and 300m in 
the ETNP, respectively, and 5 ‘contro? SAR11 SAGs from surface water (1 m) in 
the GoM. SAG DNA was prepared using the NexteraXT DNA Sample Prep kit 
(Illumina, San Diego, CA, USA) following the manufacturer's instructions. Libraries 
were pooled and sequenced at Georgia Tech on two runs of an Illumina MiSeq using 
a 500 cycle (paired end 250 x 250 bp) kit. Of the initial 27 SAGs, 8 were recovered in 
very low abundance in the read data or were removed due to potential contamina- 
tion (>5%) as estimated with CheckM (see below) or the presence of 18S rRNA gene 
fragments, yielding the final set of 19 SAGs analysed here (Supplementary Table 1). 
SAG sequence quality control assembly and functional gene annotation. 
Coupled reads were merged, when overlapping, using PEAR™. Both merged and 
un-merged reads were trimmed using SolexaQA+-++*° with a PHRED score cutoff 
of 20 and a minimum fragment length of 50 bp. Illumina adaptors were clipped 
using Scythe (https://github.com/vsbuffalo/scythe) and reads were re-filtered for 
length (50 bp). Quality-trimmed reads were assembled with SPAdes*. Percentage 
of contamination and genome completeness were assessed based on recovery 
of lineage-specific marker gene sets using CheckM”. From the total of 27 SAG 
assemblies, 7 were excluded from the analysis due to low coverage (that is, less 
than 70 kb) or the presence of 18S rRNA sequences and BLASTP top matches 
to eukaryotic sequences reflecting contamination. For the remaining SAGs that 
passed the original quality control thresholds (Supplementary Table 1), when 
multiple fragments of a bacterial single-copy marker gene were identified, manual 
inspection of alignments revealed that multiplicity was due to assembly breaking 
points rather than contamination from divergent sequences, and such cases were 
retained for analysis (Supplementary Table 2). Evidence for contamination was 
detected in only one SAG, SAG A2 from the GoM, as multiplicity of divergent and 
nearly full-length marker genes. This SAG was excluded from further analysis. 

For the final data set of 19 SAGs, coding sequences were predicted on scaffolds 
longer than 500 bp with GeneMark.hmm™ and 16S rRNA gene sequences were 
identified using RNAmmer”’. 16S rRNA sequences identified in the assemblies 
(4/4 GoM SAGs, and 8/15 OMZ SAGs) were compared to the 470 bp 16S fragment 
obtained during the initial SAG screening and confirmed to be identical. As an 
additional quality control step, all predicted genes from the 19 SAGs were taxo- 
nomically annotated using MyTaxa”! and the taxonomic distributions of adjacent 
genes in the concatenated assembly (10 gene windows) were inspected for possible 
contamination. As discussed in Supplementary Discussion, a contaminant genome 
in the assembled contigs can be visualized in the MyTaxa scan plots (Extended 
Data Fig. 1). 

Predicted genes were functionally annotated using the blast2go pipeline*® for 
assignment to metabolic pathways, and screened manually for evidence of anaerobic 
energy metabolism. Detected genes of anaerobic metabolism, including nitrate 
reductase (nar) genes, as well as terminal oxidase genes and the single-copy marker 
gene rpoB, were further verified using HMMER3 (http://hmmer.janelia.org/) 
with default settings and recommended cutoffs for a match against available Pfam 
models™. Statistics of SAG quality control, assemblies, and contamination testing 
are in Supplementary Table 1 and 2. 

Phylogenetic placement of SAGs. The evolutionary relatedness of SARI1 
SAGs was assessed using the identified full or almost full-length 16S rRNA gene 
sequences from the assembled SAGs. For the SAGs from which no full-length 
16S rRNA fragments were assembled, the shorter fragments obtained during 
screening were used in pairwise comparisons with full-length sequence references 
(Supplementary Table 3, 16S matrix). The 16S rRNA sequences from publicly 
available SAR11 genomes, as well as previously published 16S sequences®'* from 
subclades with no genome representatives, were included in the alignment to aid 
in the classification of the SAR11 subclades. Additionally, genome representatives 
of divergent alphaproteobacteria classes, as well as a beta- and gammaproteobacte- 
rium were included to facilitate the rooting of the tree. Maximum likelihood phy- 
logenetic reconstruction was performed with RAxML with 1,000 bootstraps and 
the GTR model for nucleotides®. Additionally, hidden Markov models (HMMs) 
of 106 housekeeping genes found in single copy in bacterial genomes were used to 
identify marker genes in available SAGs and reference genomes using HMMER3 
(http://hmmer.janelia.org/) with default settings and the recommended cutoff*. 
The identified marker genes (Supplementary Table 1) were aligned using Clustal 
Omega®! and the protein alignments concatenated using Aln.cat.rb from the 
enve-omics collection (http://enve-omics.ce.gatech.edu/) to remove invariable 
sites and maintain protein coordinates. The concatenated alignment was used to 
build a maximum likelihood phylogeny with RAxML, using 1,000 bootstraps, and 
the PROTGAMMAAUTO function, which identifies the best amino acid substi- 
tution model for each protein. SAGs where assigned to SAR11 subclades based 
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on the consensus categorization of both 16S rRNA and marker gene phylogenies, 
in accordance with previously published subclade identification sequences’. 
OMZ-derived SAR11 SAGs from the SAR11 Ila lineage were further categorized as 
subclade Ila.A, to differentiate them from the currently available reference SAR11 
Ila representative (HIMB058), classified here as subclade Ila.B. Average amino acid 
identities (AAIs) were estimated as described previously”. 

nar functional gene validation and phylogeny. Reference nitrate reductase and 
nitrite oxidoreductase protein sequences (n = 697) representing divergent bacterial 
and archaeal phyla were downloaded from UniProt/Swiss-Prot®, together with 
representatives of other DMSO family oxidoreductases (n= 71), using as a guide 
the reference tree from ref. 64. From this 697-sequence set, 321 full-length NarG/ 
NxrA sequences were selected to represent all the clades, along with the 71 addi- 
tional non-NarG/NxrA proteins. The NarG/NxrA subset included the closest 
relatives to the SAG OP1 and Gamma-type Nar variants, as determined by BLAST. 
All protein sequences (n= 392), including the full-length NarG identified in the 
SAGs, were aligned with Clustal Omega, and a maximum likelihood phylogeny was 
reconstructed with RAxML with 1,000 bootstraps and the PROTGAMMAAUTO 
model. Partial fragments of the NarG protein were then added to the alignment 
using MAFFT’s ‘addfragments”®, and the evolutionary placement algorithm (EPA) 
implemented in RAxML was used to place them within the reference tree. The 
same procedure was followed for the phylogenetic reconstruction and placement 
of identified NarH protein sequences. 

Quantification of narG-encoding reads from the metagenomes and meta- 
transcriptomes was done using BLAST searches against a manually curated 
NarG database and the software ROCker (L. H. Orellana, L. M. Rodriguez-R and 
K. T. Konstantinidis, manuscript submitted). Using receiver-operator curve (ROC) 
analysis, ROCker identifies the most discriminant BLAST bit-score per position 
in a reference alignment (NarG database) given a certain read length by simulat- 
ing in silico metagenomic data sets that include the reference genes. This strategy 
permits the accurate estimation of abundance of target genes in short-read data 
sets, minimizing false negatives and positives derived from closely related proteins 
or conserved domains, a critical challenge in the detection of narG due to the 
ubiquity of other closely related DMSO oxidoreductases. The NarG database was 
manually curated and confirmed by the phylogenetic reconstruction of all available 
nitrate reductase and nitrite oxidoreductase sequences and visual inspection of the 
multi-sequence alignment for conservation of known functional domains and 
motifs. The final NarG database consisted of 697 nitrate reductases/nitrite oxidore- 
ductases (positive set) and 71 representative non-NarG/NxrA DMSO family 
proteins (negative set for identification of false positive BLAST matches). All data 
sets, as well as the ROCker models built for narG quantifications in metagenomes 
with different read lengths, are available at http://enve-omics.ce.gatech.edu/rocker/. 
Additionally, the model for the identification of rpoB fragments in metagenomes 
was used to estimate coverage of rpoB in metagenomes. 

The abundance of narG sequences in meta-omic data sets was estimated as 
genomic equivalents for each sample, by normalizing the coverage of narG for 
the gene length (reads per nucleotide of narG), and dividing the normalized 
value by the rpoB-normalized coverage (reads per nucleotide of rpoB) as shown 
in Supplementary Table 4. To quantify the abundance of the narG variants (OP1, 
Gamma-type), protein fragments were predicted in all identified (from ROCker) 
narG reads using FragGeneScan® and placed in the reference DMSO tree using 
RAXxML-EPA. The abundances of the OP1-type or Gamma-type variants were 
estimated based on the number of reads that were placed in the terminal or internal 
nodes of the aforementioned clades on the reference tree, using JPlace.to_iToL.rb 
from the enve-omics collection. The NarG metagenomic reads (predicted open 
reading frames (ORFs)) placed within those nodes, were used to construct the 
recruitment plots shown in Extended Data Fig. 8b. BLASTP was used to map 
the reads against the reference NarG sequences, and the recruitment plots were 
constructed with the BlastTab.catsbj.pl and BlastTab.recplot.R scripts from the 
enve-omics collection. 

Thus, the reported abundances of OP1 and Gamma-type narG in metagenomes/ 
metatranscriptomes are based on phylogenetic assignment of nar reads, rather than 
a strict sequence similarity cutoff. To estimate a lower limit for the abundance of 
NarG sequences presumably encoded by SAR11 genomes, the number of reads 
with more than 95% nucleotide identity to the reference NarG sequences found 
in the SAGs was estimated, and shown in Extended Data Fig. 6b, c. The figure 
shows abundance estimates for reads that are phylogenetically assigned to OP1 
and Gamma nodes, with partitioning of the data into reads that share less than 
and greater than 95% nucleotide identity with the SAG OP1 and Gamma-type 
references. 

NarG divergence in reference closed genomes. Identification of NarG in all closed 
genomes available from GOLD (27,461 bacterial and 685 archaeal genomes)°* 
was performed using HMMER3 with default settings. The results were further 
refined by a competitive BLAST search” against the custom-made NarG reference 


ARTICLE 


database (used for ROCker), which included DMSO family oxidoreductase enzyme 
reference sequences. Matches with best hit against NarG sequences and a bit 
score higher than 900 were annotated as nitrate reductases or nitrite oxidoreduc- 
tases. When found in multiple copies (up to 6), a reciprocal BLASTP search was 
performed to estimate sequence divergence, measured as amino acid identity. 
Quantification of SAR11 clades in metagenomes and metatranscriptomes. 
For each metagenome/metatranscriptome, reads potentially derived from 
SAR11 genomes were identified by a competitive BLAST best-match approach. 
A custom database was built using all available closed genomes from NCBI-ftp 
(2638 bacterial, 165 archaeal) and 39 genome representatives of the SAR11 lineage, 
including 20 published isolate or SAG sequences and the 19 SAG sequences pro- 
duced in this study (Supplementary Table 1). Metagenomic and metatranscrip- 
tomic reads (predicted ORFs with FragGeneScan) were then compared against 
the database using BLASTP, and the subset of reads with a best match against 
any of the SAR11 genomes and an e value < 0.001 was classified as ‘“SAR11 
reads’ (Supplementary Table 4). To quantify the relative abundance of distinct 
SAR11 subclades, the SAR11 reads were further classified as follows. We used the 
coverage of marker genes that could be found in all the subclades to more accu- 
rately estimate the abundance of distinct subclades and overcome both the biased 
representation of SAR11 subclades in the available genomes, and the partial nature 
of SAG genomes. For all 39 available SAR11 genomes, 5,707 orthologous genes 
(OGs) were identified by reciprocal best match and Markov clustering with infla- 
tion 1.5 using ogs.mcl.rb from the enve-omics collection. From the identified OGs, 
507 were represented at least once in each of the 8 subclades. All metagenomic 
and metatranscriptomic reads (SAR11 subsets) were mapped against the database 
containing all protein sequences from the 507 OGs (which were tagged according 
to subclade of origin) using the BLASTX option from Diamond” and only the best 
matches for each read were kept. The coverage of each OG for each subclade was 
estimated based on that competitive best match result, normalized for the gene 
length (reads per bp of each OG), and the average coverage of all 507 OGs was used 
to estimate the abundance of subclades. Additionally, the number of rpoB reads 
for each metagenome was identified (for either the total data set or the subset of 
the SAR11 reads), and the coverage of rpoB was used as a normalization factor to 
estimate the abundance of SAR11 subclades over the total bacterial community. 
Functional characterization of SAR11 nar operons. A previously constructed 
NO ~ reductase deficient Escherichia coli strain’! was used as the genetic system for 
heterologous expression of SAR11 nar genes. We used whole-genome sequencing 
(Illumina MiSeq) to confirm that this strain lacked all three NO3~ reductases 
(AnarGI AnapAB narZ::Q; Extended Data Fig. 5). The phenotype of this strain, 
hereafter referred to as the triple mutant, was verified by a lack of NO. production 
and an absence of growth with NO3” under anaerobic conditions, compared to the 
wild-type MC4100 E. coli strain (Extended Data Fig. 5). 

Complete sequences from one OP1-type, and one Gamma-type nar operon, 
containing upstream and downstream sequences, were identified from the ETNP 
300m and ETNP 120m metagenomes (see above). These sequences were confirmed 
to be identical to the operons in SAG A7 (which was lacking part of the N terminus 
of the narG gene; Extended Data Fig. 3). Purified DNA from the ETNP 300m and 
ETNP 120m metagenomic samples was used as template for PCR amplification. 
In addition, we used genomic DNA from E. coli strain K12 MG1655 as a posi- 
tive control. Because metagenomic samples are usually fragmented and the entire 
nar operon is 6.9 kb, primers were designed to amplify the OP 1-type, Gamma- 
type and E. coli wild-type operon in two blocks. The first block spanned from the 
native NarG ribosome binding site to the end of the narG gene, and the second 
block included the end of the narG gene to the narI stop codon. The resulting PCR 
products were gel purified, assembled and cloned into pBbAIK, a low-copy vector 
including the IPTG-inducible pTrc promoter” by In-Fusion cloning (Clontech, 
Mountain View, CA). The cloning reactions were transformed into TOP10 cells, 
and inserts were sequence-verified by Pacbio sequencing (Pacific Biosciences, 
Menlo Park, CA). The final nar sequences were identical (OP1 operon, and NarG,| 
proteins of Gamma operon) or nearly identical with silent substitutions (99% and 
98% AAI for the Gamma-type NarG and H proteins) compared to the sequences 
from SAG A7 (GenBank accessions KX275213, KX275214). Correct clones were 
isolated for each operon type, and purified plasmid was used to electroporate 
the triple mutant E. coli strains described above to generate recombinant strains 
expressing the heterologous nar operons for functional characterization. 

For anaerobic cultures performing NO3~ respiration, strains were first induced 
in LB medium with 0.5 mM IPTG for 5h, and 20,11 of inoculum was subsequently 
introduced in gas tight tubes under N> atmosphere. The medium was prepared 
as previously described”*, composed from potassium phosphate buffer (100 mM, 
pH 7.4), 15mM (NH4)2SO,, 9mM NaCl, 2mM MgSOug, 541M NazMoOu,, 10.M 
Mohr’s salt, 100}1.M CaCh, 0.5% casaminoacids and 0.01% thiamine. Glycerol 
(40 mM) was used as the sole carbon source, and NO3” was added at 30 mM. IPTG 
(0.5mM), kanamycin (301g/ml) and streptomycin (30,1g/ml) were used with the 
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recombinant strains. Samples for NO; and NO. concentrations were obtained 
at regular time intervals during incubations, filtered through 0.2|1m porosity filters 
and injected into a Dionex DX ion chromatography unit with the Dionex IonPac 
AS14A analytical column”, Growth in incubations was assessed as optical density 
(OD¢00 nm). Growth curve data from replicated cultures (triplicate) were fitted to 
a logistic model with variables r (specific growth rate), Po (initial population) and 
K (carrying capacity), using nonlinear least-squares estimates and prediction of 
OD per time point with confidence intervals as implemented in enve.growthcurve 
from the enve-omics collection (http://enve-omics.ce.gatech.edu/). 

Nitrate reductase activity was further verified in cell lysates from cells grown 

anaerobically for 12 days. Cells resuspended in 100 mM sodium phosphate buffer 
(pH 7.2) containing 0.02% Tween 80 were lysed by sonication in a Bioruptor UCD- 
200 (Diagenode). Protein concentration of the cell lysate was determined using a 
Qubit 2.0 fluorometer (Thermo Fisher Scientific) and 100 1g of protein was added 
to a reaction containing 100 mM NaNO; and benzyl viologen as electron donor. 
The reaction was bubbled with N> for 2 min before initiation with the addition of 
50 l of 30 mM sodium dithionite in 10 mM NaOH (final volume: 500 il). Aliquots 
(5011) were removed at 20 min intervals and NO.” concentration determined 
colorimetrically after the addition of 5011 Griess reagent (prepared with equal 
volumes of 0.1% N-1-napthylethylenediamine dihydrochloride in water and 1% 
sulfanilamide in 5% phosphoric acid). All assays were performed in triplicate. 
Finally, NO2~ production from NO; was further confirmed using whole cell assays 
with of 8 replicate clones (per recombinant strain) grown aerobically on 96-well 
plates in 70,1] Luria—Bertani (LB) broth supplemented with 30 mM NO; and 
various IPTG concentrations. Nitrite production was identified via the Griess 
reaction as described above. 
Quantitative PCR of 16S rRNA genes and SAR11 nar variants. Quantitative 
PCR (qPCR) was used to count OP1- and Gamma-type narG and total bacterial 
16S rRNA gene copies. Sea-water samples for qPCR were collected in 2014 from 
three sites in the ETNP, including station 6 from which the SAG samples were 
obtained. 

Primers for narG PCR were designed based on alignments of narG sequences 
recovered from OMZ SAGs, targeting sites inclusive of all OMZ SAR11-affiliated 
nar variants and exclusive of narG from the closest database reference sequences. 
Primer selection resulted in the following: GammaF, 5’/-GCGTAAAATAATTT 
CTTCTCCTACATGGA-3’; and GammaR, 5’-AGTTCAATCCAGTCATTAT 
CTTCTACATC-3’ amplifying a 401-nucleotide fragment of the Gamma-type 
nar; and OP1E, 5‘-ACCATCAAGGAATAAGAGAATTAGG-3’; and OP1R, 
5'-TGGATTCCGTTTTCACAATACATTTC-3’ amplifying a 288-nucleotide 
fragment of the OP1-type nar. PCR reactions were performed with DNA tem- 
plate from the OMZ 300 m sample (station 6) and the oxic Gulf of Mexico as 
a negative control with the following conditions: incubation at 50°C for 2 min, 
95°C for 10 min, followed by 40 cycles of denaturation at 95°C (15s) and anneal- 
ing at 53°C (for OP1) and 54°C (for Gamma) (1 min each). Amplicons with the 
expected length were observed only in the OMZ sample and were purified and 
concentrated using the QlAquick PCR purification kit (Qiagen). Clone libraries 
were prepared with the TOPO TA cloning kit (Life Technologies) following the 
manufacturer's protocol, and plasmids from overnight grown selected colonies 
were isolated with the PureLink Quick Plasmid Miniprep Kit (Life Technologies). 
Inserts were purified using the QIAquick PCR Purification kit and sequenced on 
an Applied Biosystems 3730xl DNA Analyzer using BigDye Terminator v.3.1 cycle 
Sanger sequencing (Life Technologies). Sequencing recovered 14 sequences gen- 
erated using OP1 primers and 12 generated using Gamma primers. All OP1-like 
sequences were most closely related (via BLASTX against the NCBI-nr database) 
to narG ofan uncultured Acetothermia bacterium OP1 (dbj|BAL57372.1|), whereas 
all Gamma-like sequences were most closely related to the gammaproteobacterial 
endosymbiont of Calyptogena okutanii (Ca. Vesicomyosocius okutanii; 
ref|WP_011930032.1)), consistent with the phylogenetic classification of the recovered 
SAG nar sequences as described in the main text and confirming the specificity 
of the primer sets. However, sequences within each clone set shared on average 
96% (OPI set) and 93% (Gamma set) nucleotide identity, raising the possibility 
that our primer sets may not amplify all OP1 and Gamma-type nar variants in the 
community. We therefore consider our abundance estimates to be lower bounds. 

The OP1 and Gamma primer sets, along with universal bacterial 16S rRNA gene 
primers 1055f and 1392r, were used for SYBR Green-based qPCR. Tenfold serial 
dilutions of DNA from a plasmid carrying narG amplicons (described above) and 
a single copy of the 16S rRNA gene (from Dehalococcoides mccartyi) were included 
on each qPCR plate and used to generate standard curves, with a detection limit 
of ~30 and 10-15 gene copies/ml for 16S rRNA and narG variants, respectively. 
Assays were run on a 7500 Fast PCR System and a StepOnePlus Real-Time PCR 
System (Applied Biosystems). All samples were run in triplicate with conditions 
as follows: 2 min incubation at 50°C, followed by 10 min at 95°C followed by 
40 cycles of denaturation at 95°C (15s) and annealing at 60°C (1 min). 


Data availability. Metagenomes from the GoM are available from the BioProject 
database under accession number PRJNA291283. Sample accession numbers and 
further information on all metagenomics and SAG data sets used in this study are 
provided in Supplementary Tables 4 and 1, respectively. 
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Extended Data Figure 1 | Evaluation of contamination based on 
MyTaxa taxonomic affiliations. a, Representative MyTaxa plots to 

test for contamination based on taxonomic affiliations of predicted 

genes. The MyTaxa algorithm®! predicts the taxonomic affiliation on 

the basis of a weighted classification scheme that takes into account the 
phylogenetic signal of each protein family. Each gene is assigned to the 
deepest taxonomic resolution (out of phylum, genus and species) for 
which a high-confidence value can be obtained (score 0.5). Each MyTaxa 
scan represents taxonomic distributions of all the predicted genes for one 
genome, given in windows of 10 genes, and sorted based on their position 
in the concatenated assembly of the genome (when a partial genome is 
used). a, b, White space in the histograms represents genes that could not 
be assigned to a given taxon due to (1) lack of BLASTP hits against the 
reference database (a collection of closed and draft genomes) or (2) lack 
of high confidence scores. Notice that for the representative OMZ SAG E5, 
more than 80% of the genes can be classified as Candidatus Pelagibacter 
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(SAR11), with an additional 10% assigned to Proteobacteria. Note there are 
no genome representatives for this taxon (that is, SAR11 subclade Ia.A) 
in the database upon which MyTaxa is based. Similar results are obtained 
for the bathytype SAR11 SAG, as this genome also lacks representatives. 
The closed genome from a coastal isolate HTCC1002 is shown for 
comparison to demonstrate a typical pattern for cases when close relatives 
of the query genome are available in the reference database, as is the case 
for this isolate. b, Taxonomic classifications of genes from the 19 SAGs 
analysed here. Each distribution was obtained from the MyTaxa scans 
performed for each SAG. The percentage of the total genes that could be 
taxonomically classified with MyTaxa was on average ~60%, and varied 
depending on the completeness of the genome (that is, partial genes are 
less likely to be assigned taxonomy with high confidence). These values 
are also reported in Supplementary Table 1. Of the genes that could be 
classified, the majority (>90%) were classified to SAR11 taxa. 
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Extended Data Figure 2 | Microdiversity within the SAR11 populations. and placement of identified RpoB metagenomic sequences (denoted with 


a, Recruitment plot of metagenomic reads from the ETNP OMZ 300m the cross symbols). The alignment length was 1,406 columns with 5.9% 
sample, against scaffolds from SAG E4. Notice that the recruited reads gaps or undetermined sites. The presence of multiple divergent rpoB reads 
vary in identities from 100% down to 85%, indicating the presence within the same subclade (predominantly for subclades Ila.A and Ic) 

of closely affiliated clades, as well as extensive microdiversity within suggests high abundance but also extensive microdiversity within those 
the same clade (that is, reads sharing >95% identity). b, Phylogenetic populations (rather than clonal populations). 


reconstruction of reference RpoB protein sequences from SAR11 genomes, 
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Extended Data Figure 3 | nar genes encoded by SAR11 populations of 
OMZs. a, nar operon and adjacent genes identified in SAR11 SAGs from 
the ETNP OMZ, and in assemblies from the 85 m and 300 m ETNP OMZ 
metagenomes. narG sequences with at least 97% amino acid similarity 
are represented with the same colour. b, c, Representative maximum 
likelihood phylogeny to show sequence variation among full-length or 
near full-length narG (b) and narH (c) amino acid sequences identified 
in the SAGs. A subset of cytoplasm-oriented Nar and Nxr enzymes from 
publicly available genomes is also included. A comprehensive phylogeny 
showing the placement of SARI1 nar sequences relative to enzymes 


Acidovorax delafieldii NarG Zi 


(n= 392) of the DMSO family is in Fig. 2a. Coloured pies represent the 
placement of shorter narG/narH gene fragments identified in the SAGs. 
Bootstrap values over 50 are shown. Outgroups (arrows) are E. coli dmsA 
(b) and dmsB (c). Note that the Gamma-type nar-containing contig 
recovered in E4 (Fig. 2a) contains narHJI, but not narG; E4 Gamma-type 
is therefore not represented in Fig. 3b. All genes co-localized in the 
nar-containing contigs are listed in Supplementary Table 5. The 
p-numbers are gene identifiers given by the gene prediction software, 
consistent with those in Supplementary Table 5. 
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Gln(Q) and Thr(T) 
in positions 398 and 399 


Extended Data Figure 4 | Identified NarG in SAR11 SAGs are 

members of the DMSO superfamily of oxidoreductases. a, Phylogenetic 
reconstruction of NarG and DMSO enzymes. The tree shown in 

Fig. 2 is presented here but has been expanded to include diverse DMSO 
oxidoreductases for direct comparison with the NarG/NxrA enzymes. 
Notice that both OP1 (green, blue, grey) and Gamma-type (red, orange) 
variants cluster within the cytoplasmically oriented Nar and Nxr enzymes. 
Six-hundred and ninety-seven NarG/NxrA proteins were identified 

from UniRef*’, and from those, 321 full-length sequences were selected 

to represent all the diverse clades. An additional 71 non-NarG/NxrA 
proteins, representative of the diverse enzymes of the DMSO superfamily 
were also included in the collection. The full-length amino acid sequences 
were aligned with Clustal Omega® and the phylogenetic tree was 


constructed by maximum likelihood and 1,000 bootstraps using RAxML®. 


The alignment length was 1,803 columns, out of which 31.2% were gaps 
or undetermined. Partial NarG sequences identified in the SAGs were 
placed on the tree using the EPA algorithm from RAxML®. The same 
collection of proteins was used to train the Rocker models and quantify 


ARTICLE 


the narG metagenomic fragments, and can be found in the enve-omics 
website (http://enve-omics.ce.gatech.edu/rocker/models). b, Alignment 
of NarG sequences from OMZ SAR11 with representative sequences from 
the DMSO superfamily of oxidoreductases. The protein motifs in the 
second and third panels are present in all functional Nar enzymes (NarG) 
and Nxr enzymes (NxrA) but not in closely related enzymes of the DMSO 
superfamily. The first panel shows the presence/absence of the TAT signal 
peptide (SRRSFLK), whose presence typically denotes a protein excreted 
to the outer membrane**!. SAR11 NarG is instead oriented towards 

the cytoplasm (lack of TAT). The second panel shows the cysteine-rich 
motif typically found in the N terminus of the type-II DMSO superfamily 
oxidoreductases”? and believed to enable the formation of a [4Fe-4S] 
cluster in these proteins’®. The Asn in position 158 of the alignment is 
typically found in catalytic subunits of nitrite reductases and DMSO 
oxidoreductases (DmsA) but not in other DMSO family enzymes. The 
third panel shows the Gln(Q) and Thr(T) in positions 398 and 399 within 
the putative substrate entry channel of the protein, which differentiate the 
Nar proteins from all other oxidoreductases of the DMSO family*”. 
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Extended Data Figure 5 | Functional characterization of the SAR11 nar 
operons in the E. coli heterologous expression system. a, Genotype of the 
E. coli triple mutant confirmed by whole-genome sequencing. The triple 
mutant lacks complete functional operons of all three NO; reductase 
enzymes, and thus is incapable of NO3” reduction. b, Anaerobic growth 
of triple-mutant clones, complemented with the SAR11 nar operons. 

For each strain three independent clones were monitored, and data from 
the replicate growth curves were fitted into a logistic model. Shaded 

areas represent the 95% confidence intervals of optical density readings 
(OD¢00 nm) in the fitted logistic growth models. NO3~ and NO.” were 
measured in parallel with ion chromatography. Note that the Gamma-type 


2 = 250 uM IPTG 


SAR11 operon complements the triple-mutant phenotype, growing 
anaerobically by reducing NO3" to NO; . E. coli encodes functional 
nitrite reductases, thus the accumulated NO. can be further reduced 

to ammonia, accounting for the non-stoichiometric NO.” production. 

c, Whole-cell NO2~ production assays under aerobic conditions. Eight 
independent clones (columns A-H) of each type (C1-C5) were inoculated 
in Luria-Bertani (LB) broth supplemented with 30 mM NO3° and 
different isopropyl-6-p-thiogalactoside (IPTG) concentrations, and the 
well plate was incubated for 2 days at room temperature. Griess reagent 
was added, and development of pink colour indicated NO.” production. 
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Extended Data Figure 6 | Relative abundance of narG variants in 
ETNP OMZ metagenomes and metatranscriptomes and various other 
ocean metagenomes. a, Relative abundance and diversity of NarG/ 
NxrA enzymes as revealed by phylogenetic placement of identified narG 
metagenomic reads (coloured pies). All identified short metagenomic 
narG reads from various oceanic metagenomes were placed within a 
reconstructed reference NarG tree to estimate the abundance of the 
different narG variants. The results of the placement are presented in 

five separate trees, based on the origin of the analysed metagenomic reads 
(ETSP metagenomes, ETNP metagenomes and metatranscriptomes, 

oxic bathypelagic and oxic surface metagenomes) for clarity. In each of 
the five trees, the coloured pies represent the abundance (normalized for 
data set size) of the short metagenomic reads clustering in the respective 
node. Specifically, the pie radius reflects read abundance as a percentage 
of the total narG genome equivalents identified (that is, number of narG 
reads compared to number of rpoB reads, normalized for gene length and 
total number of reads in each metagenome), with the size of grey pies 
representing the highest and lowest relative abundance, respectively. 


The reference tree is the same as in Fig. 3a. Scale bars represent 
substitutions per amino acid. Notice that the two narG variants affiliated 
with the SARI1 SAGs (highlighted in orange for the OP1 type and 

blue for the Gamma type) are only abundant in the metagenomes 

and metatranscriptomes from the OMZ, where they comprise more 

than 70% of the total narG read pool, as can also be observed in 

Fig. 3b and c. The number of narG reads of the OP1 or Gamma type are 
also given in Supplementary Table 1. b, qPCR-based abundance of SAR11- 
affiliated narG genes in the ETNP OMZ relative to NO. , NO3 and O2 
concentrations and qPCR-based counts of 16S rRNA. Counts of total 
bacterial 16S rRNA, OP1-type narG, and Gamma-type narG genes at three 
stations (map on legend) west of Manzanillo, Mexico in May 2014. Map 
was created with Ocean Data View (http://odv.awi.de). All assays were 
performed in triplicates, and the bars represent s.e.m. Note that 

counts of OP1- and Gamma-type narG variants are probably 
underestimates given the observed microdiversity in the community 
(Extended Data Figs 2 and 7), and therefore there is a possibility that our 
primers did not match all OP1- and Gamma-type variants. 
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Extended Data Figure 7 | Diversity of OP1 and Gamma-type narG (predicted open reading frames) from the OMZ 300 m sample, against 
amino acid sequences in the ETNP OMZ metagenome. a, Phylogenies OP1- (left) or Gamma- (right) type narG sequences from the SAR11 SAGs. 
showing all full-length narG sequences recovered in the ETNP OMZ The metagenomic reads used for recruitment were identified as ‘nar@’ 
metagenomes (85, 100, 125, 300 m), as well as those from the SAR11 using the ROCker pipeline, and their identity was further confirmed 
SAGs and corresponding narG reference sequences, with the left tree by phylogenetic placement within the narG clade on a reference DMSO 
showing OP 1-type variants and the right tree showing Gamma-type superfamily protein tree, to minimize non-specific recruitments in 
variants. NarG sequences are colour-coded based on the taxonomic conserved protein regions. Note that based on this analysis, the OP1-type 
classification of adjacent genes in the same metagenomic scaffolds, as narG variants are highly diverse in the OMZ metagenome. 


show in Supplementary Table 6. b, Recruitment of metagenomic reads 
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An evolutionarily conserved pathway 
controls proteasome homeostasis 


Adrien Rousseau! & Anne Bertolotti! 


The proteasome is essential for the selective degradation of most cellular proteins, but how cells maintain adequate 
amounts of proteasome is unclear. Here we show that there is an evolutionarily conserved signalling pathway controlling 
proteasome homeostasis. Central to this pathway is TORCI, the inhibition of which induced all known yeast 19S regulatory 
particle assembly-chaperones (RACs), as well as proteasome subunits. Downstream of TORCI inhibition, the yeast 
mitogen-activated protein kinase, Mpk1, acts to increase the supply of RACs and proteasome subunits under challenging 
conditions in order to maintain proteasomal degradation and cell viability. This adaptive pathway was evolutionarily 
conserved, with mTOR and ERKS5 controlling the levels of the four mammalian RACs and proteasome abundance. Thus, 
the central growth and stress controllers, TORC] and Mpk1/ERKS5, endow cells with a rapid and vital adaptive response to 
adjust proteasome abundance in response to the rising needs of cells. Enhancing this pathway may be a useful therapeutic 
approach for diseases resulting from impaired proteasomal degradation. 


Cell survival depends on adaptive signalling pathways to ensure that the 
supply of vital components matches fluctuating needs. The proteasome 
is essential for the selective degradation of most cellular proteins and 
thereby has a key role in most cellular processes!°. Proteasome abun- 
dance is crucial for cell fitness, but how cells maintain adequate amounts 
of proteasome is unclear. Failure to degrade mutant or misfolded 
proteins causes diverse diseases, including devastating neuro- 
degenerative diseases, which might potentially be prevented by increas- 
ing proteasome degradation’. Although the idea is attractive, increasing 
proteasome capacity remains a challenge. Thus, a better understanding 
of the mechanisms regulating proteasome abundance is required. 

The proteasome is composed of 33 subunits assembled in two 
sub-complexes, the 20S core particle (CP), flanked at one or both 
ends by the 19S regulatory particle (RP) to form the 26S proteasome’. 
Proteasome assembly requires the assistance of proteasome assembly 
chaperones’. Four evolutionarily conserved 19S RACs: Nas2, Nas6, 
Hsm3 and Rpn14 in yeast, and p27 (also known as PSMD9), p28 (also 
known as PSMD10), S5b (also known as PSMD5) and Rpn14 (also 
known as PAAF1) in mammals are needed for regulatory particle 
assembly>?. In addition, yeast cells have Adc17, a stress-inducible RAC, 
which is vital for cells to survive conditions, such as accumulation of 
misfolded proteins, which overwhelm the proteasome’®. This suggests 
that cells have evolved adaptive signalling pathways to adjust protea- 
some assembly to arising needs, but how this is achieved is unknown. 


TORC1 inhibition increases Adcl17 and the proteasome 

To determine how yeast cells maintain proteasome homeostasis, we 
decided to investigate the pathway regulating Adcl7. Adc17 is upreg- 
ulated by diverse stresses that impose a high burden on the proteas- 
ome, indicating that it is a component of an unknown generic stress 
response. Because Adc17 is induced by tunicamycin, an inducer of the 
unfolded protein response (UPR)!!, we deleted the UPR genes IRE1 
or HACI (ref. 11). This prevented tunicamycin-mediated induction of 
the UPR marker Kar2, as expected}, but not that of Adc17 (Fig. 1a), 
indicating that ADC17 was not a UPR target gene. We tested Adc17 
induction by tunicamycin in mutants thought to regulate Adc17 from 
a genome-wide regulation study’, and found that deletion of SFP1 
abolished Adc17 but not Kar2 induction by tunicamycin (Fig. 1b). 


Adc17 induction by tunicamycin was higher in a strain carrying a hypo- 
morphic allele of MRS6, a negative regulator of Sfp1 (Extended Data 
Fig. 1a). Sfp1 is a stress- and nutrient-sensitive regulator of cell growth 
with dual function!?"!5. Under optimal growth conditions, Sfp1 is 
located in the nucleus and can activate transcription of ribosomal 
protein genes, but it re-localizes to the cytosol upon stress'*!*, Sfp1 is acti- 
vated by TORCI, and in turn negatively regulates TORC] signalling, asa 
feedback mechanism)’. In the absence of Spfl, TORC1 is hyperactive’. 
Thus, SFP1 deletion could prevent Adc17 induction directly or by 
over-activating TORC1. Adcl7 induction by tunicamycin (Fig. 1a, b) 
coincided with Sfp1 re-localization from the nucleus to the cytosol 
(Extended Data Fig. 1b), suggesting that Sfp1 may regulate Adc17 not 
directly, but instead indirectly through TORC1. Tunicamycin inhibits 
TORCI signalling’, as observed (Fig. 1c) with the phosphorylation of 
the TORCI effector Sch9 (ref. 16). In the absence of Sfp1, TORC1 was 
hyperactive’* (Fig. 1c), and it remained active during tunicamycin- 
mediated stress, while Adc17 induction was abolished (Fig. 1c), sug- 
gesting that Sfp1 regulated Adc17 via TORC1. Confirming this, rapa- 
mycin, a selective inhibitor of TORC1 (ref. 17) induced Adc17 (Fig. 1d). 
Deletion of SFP1 abolished induction of Adc17 by tunicamycin but not 
by rapamycin (Fig. le) because SFP] deletion affected Adcl17 expression 
by hyperactivating TORC1 (Fig. 1f). To confirm this using a genetic 
approach, we examined Adc17 regulation in the thermosensitive kog1-1 
mutant. Kog] (Fig. 1f) is the yeast homologue of Raptor, a subunit of 
TORCI (ref. 18). Inactivation of KOGI1 inhibited TORCI1, as expected!®, 
and induced Adc17 (Fig. 1g), indicating that selective TORC1 inhi- 
bition induces Adcl7. We investigated whether rapamycin increased 
proteasome abundance. Consistent with our previous results for tuni- 
camycin’®, proteasome levels increased by more than twofold after 3 h 
of rapamycin treatment (Fig. 1h, i). Thus, inhibition of the central stress 
and growth controller, TORC1, increases abundance of Adc17 and of 
the proteasome in yeast. 


The MAPK Mpk1 induces Adc17 

TORC1 integrates multiple signalling pathways'”!°. We searched for 
the pathway downstream of TORC1 controlling Adc17 and proteasome 
abundance. Adc17 is not a UPR gene (Fig. 1a), but adc17A cells are 
sensitive to tunicamycin-mediated stress!°. Therefore, we examined 
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Figure 1 | TORC1 inhibition induces the proteasome assembly 
chaperone Adc17 and increases proteasome levels. ac, Immunoblots 

of lysates from yeast cells treated + tunicamycin (Tm) for 4h. Here 

and elsewhere, unless specified otherwise, the tunicamycin treatment 

was + 5,.g ml~!. WT, wild type. d, Immunoblots of lysates from yeast cells 
treated + rapamycin (Rapa) for 4h. Here and elsewhere, unless specified 
otherwise, the rapamycin treatment was + 0.2 1g ml~!. e, Immunoblots of 
yeast cell lysates after treatment + tunicamycin or rapamycin for 4h. 

f, Schematic depicting the relationship between Sfp1, TORC1 and Adc17. 
g, Immunoblots from yeast cells cultured at 30°C or 37°C for 4h. h, Native 
polyacrylamide gel electrophoresis (PAGE) (4.2%) of yeast extracts from 
cells treated with tunicamycin or rapamycin, monitored by the fluorogenic 
substrate Suc-LLVY-AMC and by immunoblots. i, Quantification of 

the 26S proteasome activity (RPCP and RP2CP) in four independent 
experiments such as the one shown in h. Data are mean+s.d.;n=4 
biological replicates. ***P < 0.001; NS, not significant (one-way ANOVA). 


non-UPR mutants sensitive to tunicamycin. The mitogen-activated pro- 
tein kinases (MAPKs) Hog] and Mpk1 were important for tunicamycin- 
stress survival in yeast (Fig. 2a), as expected”®, unlike the other MAPKs 
Fus3, Kss1 and Smk1 (Fig. 2a); Hog] being advantageous and Mpkl 
essential for stress survival (Fig. 2b). Adc17 induction by tunicamycin 
was compromised in HOGI deleted cells and abolished in cells lack- 
ing a functional allele of MPK1 (Fig. 2c and Extended Data Fig. 2a, b) 
revealing a perfect correlation between tunicamycin stress-resistance 
and Adc17 induction. Genetic interaction studies showed that overex- 
pression of HOGI failed to restore tunicamycin resistance and Adc17 
induction in mpk1A cells (Fig. 2d, e), while overexpression of MPK1 
increased both tunicamycin resistance and Adc17 induction in hoglA 
cells (Fig. 2f, g). Thus, signalling through Mpk1 is required for Adc17 
induction and tunicamycin survival. 

We examined if Mpkl was required for Adc17 induction by 
rapamycin. MPK1 is negatively regulated by TORC1 and essential for 
rapamycin survival*!””. Unlike the other MAPK, Mpk1 was essential 
for both cell viability and Adc17 induction in the presence of rapamycin 
(Extended Data Fig. 2c-e). HOGI contributed to Adc17 upregulation 
by tunicamycin (Fig. 2c) but not by rapamycin (Extended Data Fig. 2d). 
HOGI was dispensable for survival in the presence of rapamycin 
(Extended Data Fig. 2c). Thus, induction of Adc17 and rapamycin- 
resistance are perfectly correlated (Extended Data Fig. 2c, d). In agree- 
ment with a previous study’, the levels of Mpk1 increased in response 
to tunicamycin treatment (Extended Data Fig. 2d), but this increase 
was markedly attenuated in hog1A cells (Extended Data Fig. 2d). 
Thus, one key function of Hog] is to regulate Mpk1 levels (Fig. 2h), 
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providing an explanation for why Mpk1 overexpression in hog1A cells 
rescued tunicamycin-resistance and Adc17 induction (Fig. 2f, g). Over 
time, both Mpk1 phosphorylation and abundance were increased by 
tunicamycin and rapamycin treatment and this preceded Adc17 induction 
(Extended Data Fig. 3a, b). Bckl1, Mkk1 and Mkk2, three kinases that 
are upstream of Mpk1 (ref. 24), were also required for Adc17 induction 
by tunicamycin and rapamycin treatment (Extended Data Fig. 3c, d). 
Congo red, a cell-wall-damaging agent and known inducer of the Mpk1 
MAPK pathway” also induced Adc17, in a Mpk1-dependent manner 
(Extended Data Fig. 3e). These results indicate that diverse challenges 
inhibiting TORC1 signal to the Mpkl MAPK to induce the proteasome 
assembly chaperone Adc17. 


Mpk1 is a master regulator of the proteasome 

We focused on Mpk1 because it is essential for Adc17 induction (Fig. 3a) 
and examined whether Mpk1 regulated proteasome abundance. 
Deleting MPK1 completely abolished the tunicamycin- or rapamycin- 
induced increase of 26S proteasomes while increasing the abundance 
of the free core particles (Fig. 3b-d). This defect is symptomatic of 
regulatory particle assembly defects®*, and a hallmark of adc17A cells 
in response to stress!°. However, the mpk1A cells (Fig. 3b-d) appeared 
more severely affected than adc17A cells'®, suggesting that other 
MPK1-regulated factors assist regulatory particle assembly. We found 
that all the known yeast RACs: Nas2, Nas6, Hsm3 and Rpn14 were 
induced by treatment with tunicamycin, rapamycin or Congo red in 
wild-type cells (Fig. 3e and Extended Data Fig. 3f). Genetic inactivation 
of TORCI in kog1-1 cells also induced all RACs at the non-permissive 
temperature (Fig. 3f). Induction of all yeast RACs by tunicamycin and 
rapamycin was abolished in mpk1A, bck1A and mkk1/2A cells (Fig. 3g 
and Extended Data Fig. 3g, h). Overexpression of different combi- 
nations of three RACs markedly improved tunicamycin resistance 
in mpk1A cells (Extended Data Fig. 4a). Conversely, the deletion of 
three RACs severely impaired cell viability in the presence of rapamy- 
cin (Extended Data Fig. 4b). Thus, regulating the expression of RACs 
is a key function of Mpk1. These results reveal that downstream of 
TORC1 inhibition, signalling through the Mpkl MAPK pathway coor- 
dinates the induction of all RACs to control proteasome abundance and 
viability upon various stresses. 

Tunicamycin and rapamycin increased 26S abundance in wild-type 
cells and increased free core particles in mpk1A cells (Fig. 3b), suggesting 
that core particle assembly might also be regulated. We analysed the 
levels of the core particle assembly chaperones proteasome biogenesis- 
associated (Pba) 1-4 (refs 25, 26) after tunicamycin treatment, the most 
potent inducer of core particles in mpk1A cells (Fig. 3b, d). In wild-type 
cells, tunicamycin treatment increased the level of Pbal and Pba2 but not 
the level of Pba3 and Pba4 (Extended Data Fig. 5a—d). Thus, the increase 
in core particles was accompanied by an increase of the assembly 
chaperones Pbal and Pba2. This increase was unaltered upon MPK1 
deletion (Extended Data Fig. 5a—d). This demonstrates that Pbal and 
Pba2 are upregulated by the stress caused by tunicamycin treatment 
and their regulation is independent of Mpk1. The mechanism of Mpk1- 
independent regulation of Pbal and Pba2 will be an important topic 
for future study. 

We examined the regulation of proteasome subunits. Both tunica- 
mycin and rapamycin treatment increased the levels of proteasome 
subunits, and this increase required Rpn4, the transcription factor 
controlling expression of proteasome subunits”’ (Extended Data Fig. 
6a, b). Rpn4 increased upon tunicamycin or rapamycin treatment 
(Extended Data Fig. 6c). In contrast, Adc17 is upregulated independently 
of Rpn4 upon diverse stresses (ref. 10), and all yeast RACs show this same 
pattern of regulation (Extended Data Fig. 6b). Upregulation of 
proteasome subunits depends on Rpn4, and upregulation of all known 
RACs is independent of Rpn4. Deletion of MPK1 completely abrogated 
the tunicamycin- and rapamycin-induced upregulation of proteasome 
subunits, indicating that Mpk1 is a master regulator of proteasome 
homeostasis (Fig. 4a and Extended Data Fig. 6d). 
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Figure 2 | The MAPK Mpk1 is a master regulator of the stress-inducible 
proteasome assembly chaperone Adc17. a, b, Cells spotted in a sixfold 
dilution and grown for 3 days on plates + tunicamycin. c, Immunoblots 
from yeast cells grown + tunicamycin for 4h. d, f, Cells transformed with 


We identified a weak genetic interaction between RPN4 and MPK1, 
and found that both were required for survival in response to tunicamycin 
treatment (Extended Data Fig. 6e, f). Tunicamycin and rapamycin 
increased Rpn4 levels to wild-type levels in mpk1A cells (Extended 
Data Fig. 6g), suggesting that Mpk1 is acting downstream of the tran- 
scription factor Rpn4, possibly post-transcriptionally. At the protein 
level, MPK1 deletion completely abrogated the induction of proteasome 
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Figure 3 | Mpk1 coordinates the expression of all yeast RACs to control 
proteasome abundance. a, Immunoblots of lysates from yeast cells 
cultured + tunicamycin or rapamycin for 4h. b, Native PAGE (4.2%) 

of yeast cells cultured + tunicamycin or rapamycin, monitored by 
Suc-LLVY-AMC and by immunoblots. Rpt5! (Rpt5 intermediates). 

c, d, Quantifications from experiments as in b. Data are mean + s.d. of 
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empty vector or with MPK1 or HOGI spotted in a sixfold dilution and 
grown on plates + 0.75 1g ml! tunicamycin for 3 days. e, g, Immunoblots 
of lysates from yeast cells grown + tunicamycin for 4h. h, Schematic of the 
Adc17 signalling pathway. 


subunits and RACs by rapamycin treatment (Fig. 4a). At the mRNA 
level, rapamycin only modestly, yet reproducibly, increased abundance 
of RACs and proteasome subunits mRNA (Fig. 4b), and this increase 
was similar in wild-type and mpk1A cells (Fig. 4b). Rpn4 induction 
was similar in both strains (Extended Data Fig. 6g). Blocking the syn- 
thesis of new proteins with cycloheximide for 4 h did not change the 
abundance of proteasome subunits and RACs, indicating that they 
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four biological replicates. **P < 0.01; ***P < 0.001; NS, not significant 
(two-way ANOVA). e, Immunoblots from lysates of yeast cells 

cultured + tunicamycin or rapamycin for 4h. f, Immunoblots from lysates 
of yeast cells cultured at 30°C or 37°C for 4h. g, Immunoblots from lysates 
of yeast cells cultured + tunicamycin or rapamycin for 4h. 
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Figure 4 | Post-transcriptional control of RAC and proteasome subunit 
abundance by Mpk1. a, Relative abundance of the indicated proteins 

in yeast cells treated with rapamycin for 4h relative to untreated cells. 

b, Relative abundance of the indicated mRNA in yeast cells treated with 
rapamycin for 2h relative to untreated cells. rp/18a was used as a control. 
a, b, Data are mean + s.d.; n =3 biological replicates. *P < 0.05, **P < 0.01; 
NS, not significant (two-way ANOVA). 


were stable over this time period (Extended Data Fig. 6h, lanes 1 
and 4). Likewise, the stability of proteasome subunits and RACs 
appeared similar in mpk1A cells and wild-type cells (Extended Data 
Fig. 6i). However, cycloheximide completely blocked induction of 
proteasome subunits and RACs by tunicamycin and rapamycin in wild- 
type cells (Extended Data Fig. 6h). Together these results reveal that 
the MAPK Mpk1 coordinates the translation of proteasome subunits 
and RACs to provide the increased proteasome abundance required 
to sustain cell viability. 


Mpk1 adapts proteasome degradation to rising needs 

We analysed the consequences of the MPK1-dependent increase of 
proteasome abundance on protein degradation. Polyubiquitinated 
conjugates represent a hallmark of impaired proteasomal degrada- 
tion and were slightly elevated in mpk1A cells compared to wild type 
(Fig. 5a, b). This defect was exacerbated upon tunicamycin or rapa- 
mycin treatment (Fig. 5a, b), suggesting impaired proteasomal deg- 
radation, and providing an explanation for why mpk1A cells failed 
to survive tunicamycin (Fig. 2a) or rapamycin treatment (Extended 
Data Fig. 2c). 

We examined the degradation of diverse proteasome reporter 
substrates. The metastable Ura3-3 reporter was rapidly degraded in 
wild-type cells cultured at 37 °C, but not in cells harbouring a thermo- 
sensitive mutation in the proteasome subunit Rpt4 (Extended Data 
Fig. 7a, b). Similarly, the degradation of the reporter substrate was strik- 
ingly compromised in mpk1A cells (Extended Data Fig. 7c, d). The deg- 
radation of the two well-characterized proteasome reporter substrates, 
CPY*-HA and Ass-CPY*-GEP, which are localized in the endoplasmic 
reticulum and in the cytosol, respectively*?*°, was also compromised 
in mpk1A cells (Fig. 5c-f). The protein degradation defect of mpk1A 
cells was more pronounced in cells challenged with tunicamycin and 
rapamycin treatment (Extended Data Fig. 7e-). Together with the 
previous findings, this demonstrates that Mpk1 maintains adequate 
levels of proteasome required to sustain protein degradation and cell 
viability under challenging conditions. 


Evolutionary conservation of proteasome regulation 

Four RACs are evolutionarily conserved with p27, p28, S5b and Rpn14 
being human orthologues of the yeast Nas2, Nas6, Hsm3 and Rpn14, 
respectively*°. We investigated whether the TORC1 and Mpk! regu- 
lation of RACs was evolutionarily conserved. Inhibition of mTOR by 
Torin-1 rapidly increased the levels of all mammalian RACs (Fig. 6a, b), 
similar to what was found in the experiments in yeast (Fig. 3e, f). 
mTOR inhibition resulting from nutrient starvation also increased the 
RACs (Extended Data Fig. 8a, b). As in yeast, the concerted increase of 
the RACs was accompanied by an upregulation of proteasome subunits 
(Fig. 6a, b), and resulted in an increase in the levels of 26S proteasome 
(Fig. 6c, d and Extended Data Fig. 8c, d). This response was acute, 
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Figure 5 | Mpk1 adjusts proteasome degradation to match the needs 

of the cell. a, Immunoblots of lysates of yeast cells cultured + tunicamycin 
or rapamycin for 4h. Poly-Ub: polyubiquitinated conjugates. 

c, e, Immunoblots from lysates of yeast cells expressing CPY*-HA (c) or 
Ass-CPY*—GFP (e) treated with 35 }.g ml! cycloheximide (CHX) for the 
indicated time. b, d and f, show quantification of a, c and e, respectively. 
Data are mean £s.d.; n=4 (b) and n=3 (d, f) biological replicates. *P < 0.05; 
**P < 0.01; ***P < 0.001; NS, not significant (two-way ANOVA). 


with a rapid return to basal levels (Fig. 6a—d). As previously reported*!, 
singly capped proteasome RPCP (CP, core particle; RP, regulatory par- 
ticle) was more abundant than doubly capped proteasome RP2CP in 
mammalian cells (Fig. 6c). 

Conversely, medium replenishment to increase nutrient supply and 
activate mMTORC1 had the opposite effect resulting in S6K1 phosphoryl- 
ation (Extended Data Fig. 9a) and decreasing abundance of both RACs 
(Extended Data Fig. 9a, b) and proteasome (Extended Data Fig. 9c, d). 
Rapamycin, a selective mTORC1 inhibitor, also acutely and transiently 
induced the RACs as well as proteasome subunits (Extended Data 
Fig. 10), confirming that, as in yeast, mTORC1 controls proteasome 
homeostasis. We wondered whether ERK5 (also known as MAPK7) 
(ref. 32), the mammalian orthologue of Mpk1, also regulates protea- 
some abundance. ERK5 overexpression in yeast rescued tunicamycin 
resistance in mpk1A cells (Fig. 6e). Knocking down ERKS with short 
interfering RNA (siRNA) in human cells resulted in a decrease of 
the four mammalian RACs p27, p28, S5b and Rpn14 (Fig. 6f, g), as 
well as the 26S proteasome (Fig. 6h, i). Thus, mammalian ERK5, 
like yeast Mpk1, controls RACs and thereby acts as a switch to control 
proteasome abundance. 


Discussion 

Here we report a general and evolutionarily conserved homeostatic 
response that increases proteasome abundance as needed, through the 
coordinated upregulation of regulatory particle assembly chaperones 
and proteasome subunits. The master regulators of growth and stress, 
TORC1 and Mpk1/ERKS are central to this response. Consistent with 
the general principle of homeostatic responses, we observed that pro- 
teasome increase is an acute and rapidly reversible response. Trying 
to identify the other components of this proteasome homeostatic 
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Figure 6 | Evolutionary conservation of the pathway controlling RACs 
and proteasome abundance. a, b, Inmunoblots (a) and quantifications 
(b) of the indicated proteins in lysates of HeLa cells treated with 

250nM Torin-1 for the indicated time. c, d, Native PAGE (4.2%) (c) and 
quantifications (d) of HeLa cell lysates following treatment as in a and 
revealed with Suc-LLVY-AMC and by immunoblots. e, mpk1A cells 
transformed with a plasmid encoding the human ERKS or an empty vector 
were spotted in a sixfold dilution and grown on plates + tunicamycin 


response, in particular the mechanisms regulating 20S assembly and 
determining how proteasome levels return to baseline after an acute 
increase will be the subject of future studies. 

Our results also provide a framework for resolving inconsistences in 
previous observations. It was reported that when cultured in absence 
of serum, proteasomal degradation is increased in cells lacking Tsc2, a 
negative regulator of TORCI (ref. 33). In contrast to this, a recent study 
reported that mTOR inhibition activates proteasomal degradation by 
a mechanism proposed to be driven by increased ubiquitination** 
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for 3 days. f, g, Immunoblots (f) and quantifications (g) of the indicated 
proteins in lysates of HeLa cells 3 days after transfection with a non- 
target siRNA (siCTL) or a siRNA targeting ERK5 (siERK5). h, i, Native 
PAGE (4.2%) (h) and quantifications (i) of HeLa cell extracts 3 days after 
transfection with siCTL or siERK5 monitored by Suc-LLVY-AMC or by 
immunoblots. b, d, g, i, Data are mean + s.d; n = 3 biological replicates. 
*P<0.05; **P < 0.01; ***P < 0.001; NS, not significant (b, d, one-way 
ANOVA; g, two-way ANOVA; i, two-tailed Student's t-test). 


Considering this in light of our results, it may be the adaptive response 
to the stress resulting from the lack of Tsc2 combined with serum star- 
vation that increases proteasomal degradation in Tsc2 ~~ cells, rather 
than Tsc2 deletion per se. 

In line with our findings is the well-established notion that mTOR 
activation enhances anabolic processes and represses catabolic 
processes!**°. mTORCI1 is known to repress autophagy. We show here 
that TORCI restricts proteasome abundance and this is rapidly allevi- 
ated upon TORCI inhibition. Therefore, the same controller TORC1 
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restricts the abundance of the two cellular proteolytic systems, the 
proteasome and autophagy. Our findings integrate the regulation of 
proteasome assembly and abundance with growth and cellular metab- 
olism, and suggest that the increased proteasome capacity resulting 
from TORC1 inhibition may also contribute to the benefit of the widely 
used TORC] inhibitors. 

The current prevailing view is that protein degradation is largely 
regulated at the level of ubiquitination. Here we demonstrate that mod- 
ulating proteasome abundance is an important component of regula- 
tion of proteasomal degradation. Adapting proteasome abundance is 
vital to cope with overwhelming cellular needs, implying that protea- 
some abundance can be rate limiting under critical conditions. The 
evolutionary conservation of the TORC1 and Mpk1/ERKS5 pathway 
controlling proteasome abundance further highlights the importance 
of this regulation. 

The pathway identified here can be used as a unique switch to 
increase proteasome assembly and abundance on demand. Because 
many human diseases are associated with accumulation of misfolded 
proteins, increasing proteasome abundance by manipulating the 
switches identified here could be used as a generic strategy to reduce 
the burden of misfolded proteins that accumulate in such age-related 
diseases. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Data reporting. No statistical methods were used to predetermine sample size. The 
experiments were not randomized. The investigators were not blinded to allocation 
during experiments and outcome assessment. 

Yeast strains, plasmids and growth assays. Gene-deletion mutants and their 
isogenic wild-type strain (BY4741) were grown in YPD medium according to 
standard protocols**. To assess growth phenotypes, exponentially growing liquid 
cultures expressing the indicated genes were equilibrated to an OD¢00 (Acoonm) of 
0.2, and 4,11 samples were spotted in serial dilutions (1:6) onto YPD or selective 
media as required. Plates were incubated at 30°C for 3 days. To assess tunicamycin 
and rapamycin sensitivity, cells were spotted on plates supplemented with tunica- 
mycin (0.25,gml"! or 0.75 1g ml, as indicated) or rapamycin (20ngml~'). Yeast 
strains and plasmids used in this study are presented in Supplementary Tables 1 
and 2, respectively. 

Tunicamycin (Sigma-Aldrich; 2.5 mg ml! stock) aliquots were stored at —20°C 

and used within three months. Rapamycin (Sigma-Aldrich; 1 mM in DMSO) 
and Torin-1 (Santa Cruz Biotechnology; 1mM in DMSO) aliquots were stored 
at —80°C and used within a month. Cycloheximide (Sigma-Aldrich; 35mgml! 
in ethanol) was used at 35}.g ml! final concentration to inhibit translation in 
yeast. 
Immunoblot analyses in yeast. 10 ml of exponentially growing cells adjusted to 
an OD6¢o0 of 0.2 were treated with 5 1g ml! tunicamycin (Tm), 0.2 Lg ml !rapamy- 
cin, 50j,gml-'Congo red (CR) or DMSO for 4h at 30°C. Cells were harvested by 
centrifugation at 8,000g for 30s at 4°C, pre-treated with 2M LiAc and then 
0.4M NaOH for 5 min on ice as in ref. 37. Cell lysates were then tested as in 
ref. 38. Briefly, cells were resuspended in 100 11 of lysis buffer (0.1 M NaOH, 
0.05 M EDTA, 2% SDS, 2% 8-mercaptoethanol, one complete protease inhibitor 
cocktail tablet (PiC, Roche) per 50 ml, one phosphatase inhibitor cocktail tablet 
(PhosSTOP, Roche) per 10 ml). For the detection of poly-ubiquitinated proteins, 
the lysis buffer is supplemented with 5mM N-ethylmaleimide (Sigma-Aldrich). 
Lysates were incubated at 90°C for 10 min. 2.5 11 of 4M acetic acid were sub- 
sequently added before vortexing for 30s. Lysates were incubated at 90°C for 
10 min and then cleared by centrifugation for 10 min at 16,000g. Supernatants 
were transferred to a clean tube and protein concentrations were measured by 
monitoring the OD2g0. Protein concentrations were equilibrated to 1 Lg of total 
proteins per 11, and 801 of lysates were mixed with 201] of 5x loading buffer 
(0.25 M Tris-HCl (pH 6.8), 10% SDS, 50% glycerol, 0.05% bromophenol blue). 
151g of total protein extract was loaded on Bolt 4—12% Bis-Tris Plus gels (Life 
Technologies) and resolved in MES buffer. Gel-separated protein samples were 
transferred to nitrocellulose membranes (Life Technologies). Membranes were 
cut and their fragments were incubated with antibodies to Kar2 (sc-33630; Santa 
Cruz Biotechnology, 1:1,000), GFP (ab290; Abcam, 1:5,000), HA (mHA.11; 
Covance, 1:2,000), TAP (CAB1001; Pierce, 1:1,000), ubiquitin (646302 (P4D1); 
BioLegend, 1:1,000), Adc17 (Bertolotti laboratory’; 1:1,000), P-T737-Sch9 (Maeda 
laboratory"; 1:5,000), Mpk1 (sc-6803; Santa Cruz Biotechnology, 1:1,000), Hog] 
(sc-9079; Santa Cruz Biotechnology, 1:1,000), P-Mpk1 (catalogue number 9101; 
Cell Signaling Technology, 1:1,000), RptS (BML-PW8245; Enzo life sciences, 
1:5,000), 20S core subunits (CP) (BML-PW9355; Enzo life sciences, 1:2,000), Nas6 
(ab91447; Abcam, 1:1,000) and Nas2, Hsm3 and Rpn14 (Hochstrasser laboratory’; 
1:1,000). Proteins were visualized by ECL Prime (GE Healthcare) using chemi- 
Smart 5000 or ChemiDoc Touch equipments (Bio-Rad). 

For analyses of the phosphorylation status of Sch9, cell aliquots were taken at the 
indicated times and mixed with trichloroacetic acid (TCA) at a final concentration 
of 6%. Cell lysates were then prepared as described previously”’. 

Native PAGE in yeast. 30 ml of exponentially growing cells adjusted to an OD¢00 
of 0.2 were treated with 5,.gml“! tunicamycin, 0.2 1g ml“! rapamycin or DMSO 
for 3h at 32°C. Cells were then harvested, washed in ice-cold water, resuspended 
in native lysis buffer (50 mM Tris-HCl (pH 7.4), 1mM EDTA, 5mM MgCh, 1mM 
DTT, 2mM ATP) as in ref. 40, and disrupted with glass beads (10 times for 30s) at 
4°C. After removal of the glass beads, the extracts were cleared by centrifugation 
at 14,500g for 10 min at 4°C. Protein concentration was measured by monitoring 
OD20 and 80 ul of adjusted extracts were mixed with 2011 of 5x native loading 
buffer (0.25 M Tris-HCl (pH 6.8), 50% glycerol, 0.05% bromophenol blue). 25 1g 
of each extract were subjected to 4.2% native PAGE. In-gel peptidase assay was 
performed as described previously’” before being transferred to nitrocellulose 
membranes. Membranes were incubated with antibodies to 20S (PW9355; Biomol, 
1:2,000) and Rpt5 (PW8245; Biomol, 1:1,000). Proteins were visualized by ECL 
Prime (GE Healthcare). 

Microscopy. Images of yeast cells carrying a GFP-tagged SFP 1 at the endogenous 
locus were taken using Zeiss-710 confocal microscope. The excitation laser wave- 
length, emission detection bands and pinhole diameter were chosen based on the 
manufacturer's recommended settings for Hoechst 33342 and GFP. The laser power 
and detector gain settings were adjusted to avoid saturation. 


Quantitative RT-PCR. Total yeast RNA was extracted as previously described"). 
151g of purified RNA was treated with the Turbo DNase kit (Ambion) and 1 1g of 
DNA-free RNA was synthesized into cDNA using the iScript cDNA synthesis kit 
(Bio-Rad laboratories). cDNA was diluted 1:10 before the quantitative RT-PCR 
was performed. 

Quantitative RT-PCR with primers alg9 (forward): cacggatagtggctttget- 
gaacaattac, alg9 (reverse): tatgattatctggcagcaggaaagaacttggs, rpl18a (forward): 
gtgccagagccaagattgtt, rpl18a (reverse): tggagctctgacagctaattga, pre4 (forward): 
tgaaaatgcgtatgacaatcct, pred (reverse): tcaaaaatatagctgggttcgag, pre10 (forward): 
aagtggctcttattggegcta, prel0 (reverse): ttcgcagattgcctaccttt, rpt5 (forward): 
gcaaagaaccatgctggaat, rpt5 (reverse): tgacacgatcatcggagcta, rpt6 (forward): 
ttccattggctctactcgtg, rpt6 (reverse): aaacccgtccaattggttta, adcl7 (forward): 
cgacgacttggagaacattg, adc17 (reverse): caatgcgtccactctctcat, nas6 (forward): 
tccaaaccttccttgttgcta, nas6 (reverse): tgcttggaaagaaactgacca, nas2 (forward): 
ctagaggcgtatttcagtgtgc, nas2 (reverse): tcaccaacgcagagtccat, hsm3 (forward): 
aaaatttctgctcaatgagatgc, hsm3 (reverse): gcgctcccatcacctatc, rpn14 (forward): tgc- 
cataatagaccgaggaag, rpn14 (reverse): aggcgaattgtaccatccaa was performed using 
SYBR Select Master Mix (4472908; Applied Biosystems) on a ViiA 7 system (Life 
technologies). Expression of each gene was normalized to the housekeeping gene 
ALG9 and expressed as fold change after 2h rapamycin treatment calculated using 
Paffl equation. 

Mammalian cell culture. HeLa cells were from IGBMC (Strasbourg, France) with 
authentication and they were not used beyond passage 20 from original deriva- 
tion. HeLa cells were routinely tested for mycoplasma contaminations. HeLa cells 
were cultured in minimum essential media (MEM) (11095-080; Life Technologies) 
supplemented with L-glutamine-penicillin-streptomycin solution (G6784; Sigma- 
Aldrich) and containing 10% fetal bovine serum (FBS). The medium was changed 
every 24h. Medium replenishment experiment was carried out using DMEM 
(11960-044; Life technologies, (high glucose, no glutamine)) supplemented 
with L-glutamine-penicillin-streptomycin solution (G6784; Sigma-Aldrich) and 
containing 10% FBS. 

Mammalian cell treatments. For mTOR inhibition by Torin-1, cells were plated in 
6-well plates at a density of 400,000 cells per well. The medium was changed 24h 
after plating and a final concentration of 250nM Torin-1, 200nM rapamycin or 
DMSO was directly added to the medium 48 h after plating (confluence: 85-95%) 
for the indicated time. For starvation experiments, cells were plated in 6-well plates 
at a density of 400,000 cells per well. The medium was changed 24h after plating. 
48 h after plating, HeLa cells were washed twice with PBS before being cultured in 
Earle’s Balanced Salt Solution (EBSS) for the indicated time points. For medium 
replenishment experiments, cells were plated in 6-well plates at a density of 
400,000 cells per well. The medium was changed 24h after plating. 48h after 
plating, HeLa cells were washed twice with PBS before being cultured in fresh 
DMEM for the indicated time points. 

Immunoblot analyses in mammalian cells. Cells were rinsed twice with 
ice-cold PBS, harvested by centrifugation and lysed in 100,11 of ice-cold lysis buffer 
(50 mM Tris-HCl (pH 7.4), 150mM NaCl, 1% Triton X-100, 0.1% SDS, 1% sodium 
deoxycholate, one complete protease inhibitor cocktail tablet (PiC, Roche) per 
50 ml, one phosphatase inhibitor cocktail tablet (PhosSTOP, Roche) per 10 ml). 
Lysates were then sonicated for 3 min (1s on/1s off). The soluble fractions from 
cell lysates were isolated by centrifugation at 16,000g for 10 min at 4°C and protein 
concentrations were measured using BCA protein assay kit (Thermo scientific) 
and adjusted to 1 1g of total proteins peril. 8011 of adjusted protein extracts were 
mixed with 2011 of 5x loading buffer (0.25 M Tris-HCl (pH 6.8), 10% SDS, 50% 
glycerol, 0.05% bromophenol blue). 151g of total protein extract was loaded on 
Bolt 4-12% Bis-Tris Plus gels (Life Technologies) and resolved in MES buffer. 
Gel-separated protein samples were transferred to nitrocellulose membranes (Life 
Technologies). Membranes were cut and their fragments were incubated with 
antibodies to P-p70-S6 Kinase (P-S6K1) (catalogue number 9205; Cell Signaling 
Technology, 1:1,000), p70-S6 kinase (S6K1) (catalogue number 92vh02; Cell 
Signaling Technology, 1:1,000), Rpt6 (SUG-1B8; Euromedex, 1:5,000), Alpha-7 
(PW8110; Biomol, 1:1,000), p27 (Psmd9) (WH0005715M1; Sigma-Aldrich, 
1:1,000), p28 (Psmd10) (catalogue number 12985; Cell Signaling Technology, 
1:1,000), S5b (Psmd5) (LS-C133418; LifeSpan BioSciences inc, 1:1,000), Rpn14 
(Paaf1) (ab103566; Abcam, 1:1,000), actin (ab3280; Abcam, 1:1,000), ERK5 (E1523, 
Sigma-Aldrich, 1:1,000) and POMP (ab170865; Abcam, 1:1,000). Proteins were 
visualized by ECL Prime (GE Healthcare) using chemi-Smart 5000 or ChemiDoc 
Touch equipments (Bio-Rad). 

For native PAGE, cells were rinsed twice with ice-cold PBS, harvested by centrif- 
ugation and lysed in 20011 of native lysis buffer (50 mM Tris-HCl (pH 7.4), 1mM 
EDTA, 5mM MgCl, 1mM DTT, 2mM ATP) as in ref. 40 and disrupted with glass 
beads (3 times for 20s) at 4°C. After removal of the glass beads, the extracts were 
cleared by centrifugation at 14,500g for 10 min at 4°C. Protein concentration was 
measured by monitoring OD2¢9 and 8011 of adjusted extracts were mixed with 
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20,11 of 5x native loading buffer (0.25 M Tris-HCl (pH 6.8), 50% glycerol, 0.05% 
bromophenol blue). 251g of each extract were subjected to 4.2% native PAGE. 
In-gel peptidase assay was performed as described previously’® before the samples 
were transferred to nitrocellulose membranes. Membranes were incubated with 
antibodies to Alpha7 (PW8110; Biomol, 1:1,000) and Rpt6 (SUG-1B8; Euromedex, 
1:5,000). Proteins were visualized by ECL Prime (GE Healthcare). 

RNA interference. ON-TARGET plus SMARTpool siRNA for ERK5 or non- 
targeting control (Dharmacon) were used in knockdown experiments. HeLa cells 
(200,000 cells per well) were plated in 6-well plates. 24h after plating, media were 
replenished and siRNAs were delivered into cells using RNAiMAX (catalogue 
number 13778075 from Invitrogen) according to the manufacturer's instructions. 
The medium was changed every 24h post-transfection for a total of 3 days. Cells 
were then harvested and analysed by immunoblot. 

Statistical analysis. Representative results of at least three independent experi- 
ments (biological replicates) are shown in all panels. GraphPad Prism software was 
used for all statistical analyses. Data are presented as mean and standard deviations. 
For immunoblot quantifications, level of each protein was normalized to PGK1 
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in yeast and 3-actin in mammalian cells and expressed as fold change. Data were 
analysed using unpaired Student's t-test or repeated measures analysis of variance 
(one-way ANOVA or two-way ANOVA where indicated). The level of significance 
was set at *P<0.05; **P<0.01; ***P<0.001; NS, not significant. 
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Extended Data Figure 1 | Adc17 induction is increased in mrs6-DAmp 
cells and occurs when Sfp1 is cytosolic. a, Immunoblots of the indicated 
proteins in lysates of wild-type and Mrs6-hypomorphic (mrs6-DAmP) 

yeast strains + tunicamycin for 4h. b, Representative images of yeast cells 


DAPI Sfp1-GFP Merge 


carrying a GFP-tagged SFP1 at the endogenous locus, 4 


t tunicamycin for 


4h. Scale bar, 5|1m. Representative results of at least three independent 
experiments (biological replicates) are shown. 
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Extended Data Figure 2 | Mpk1 is essential for tunicamycin and 
rapamycin survival and Adc17 induction. a, mpk1A cells transformed 
with wild-type MPK1 or a kinase-dead allele (MPK1-K52R) or empty 
vector were spotted in a sixfold dilution and grown on plates containing 
or lacking tunicamycin. b, Immunoblots of lysates of yeast strains 
shown in a, cultured for 4h + tunicamycin. c, Cells of the indicated 
genotype were spotted in a sixfold dilution and grown for 3 days at 30°C 
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tant yeast cells cultured 


for 4h + tunicamycin or rapamycin. e, Same as in a, using mpk1A 
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HOGI. Representative results of at least three independent experiments 
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Extended Data tatieate 3 | Mpk1 MAPK pathway is essential for stress- 
mediated RACs induction. a, b, Inmunoblots of the indicated proteins 
in lysates of wild-type yeast cells + tunicamycin (a) or rapamycin (b) for 
the indicated time. c, g, Immunoblots of the indicated proteins in lysates 


of wild-type and bck1A cells cultured + tunicamycin or rapamycin for 4h. 
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d, h, Immunoblots of the indicated proteins in lysates of wild-type and 
mkk1/2A cells cultured + tunicamycin or rapamycin for 4h. 
e, f, Immunoblots of the indicated proteins in lysates of wild-type or 


mpk1A cells + 50g ml“! Congo red (CR) for 4h. Representative results of 


at least three independent experiments (biological replicates) are shown. 
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Extended Data Figure 5 | Pbal and Pba2 are induced by tunicamycin in 
a Mpk1-independent manner. a-d, Immunoblots of the indicated proteins 
in lysates of wild-type yeast cells carrying a TAP-tagged Pbal (a), Pba2 

(b), Pba3 (c) and Pba4 (d) at the endogenous locus + tunicamycin for 3h. 
Representative results of at least three independent experiments (biological 


replicates) are shown. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


Extended Data Figure 6 | Mpk1 post-transcriptionally regulates 
proteasome subunits and RACs. a, b, Immunoblots of the indicated 
proteins in lysates of wild-type (a) and rpn4A (b 


or rapamycin for 4h. c, Immunoblots of the indicated p 


of wild-type yeast cells carrying a TAP-tagged RP 


locus 4 


cells + tunicamycin 


N4 at the endogenous 

+ tunicamycin or rapamycin for 4h. d, Immunoblots of the indicated 
proteins in lysates of wild-type and mpk1A cells + tunicamycin or 
rapamycin for 4h. e, rpn4A cells transformed with RPN4, MPK1, a kinase- 
dead allele of MPK1 (MPK1-K52R) or empty vector were spotted in a 
sixfold dilution and grown on plates containing or lacking tunicamycin. 
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f, mpk1A cells transformed with MPK1, RPN4 or empty vector were 
spotted in a sixfold dilution and grown on plates containing or lacking 
tunicamycin where indicated. g, Immunoblots of the indicated proteins in 
lysates of wild-type and mpk1A cells carrying a TAP-tagged RPN4 at the 
- tunicamycin or rapamycin for 4h. h, i, Immunoblots 
of the indicated proteins in lysates of wild-type (h, i) and mpk1A (i) cells 
treated with different combinations of drugs: 51g ml“! tunicamycin, 

0.2 4g ml“! rapamycin and 351g ml“! cycloheximide, where indicated 

for 4h. Representative results of at least three independent experiments 
(biological replicates) are shown. 
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Extended Data Figure 7 | Mpk1 maintains the adequate levels of 
proteasome required to sustain protein degradation. a, c, Yeast cells 
of the indicated genotype expressing GFP-tagged Ura3-3 proteins were 


treated with cycloheximide and incubated at 37 °C for the indicated time. 


b, d, Quantifications from three independent experiments (biological 
replicates) such as the one shown in a and c. e, g, Cells of the indicated 
genotype expressing CPY*-HA (e) or Ass-CPY*-GFP (g) proteins 
were treated with tunicamycin for 4h. f, h, Quantifications from three 


independent experiments (biological replicates) such as the one shown 
in eand g. i, k, Cells of the indicated genotype expressing CPY*-HA 
(i) or Ass-CPY*-GFP (k) proteins were treated with rapamycin for 4h. 
j, 1, Quantifications from three independent experiments (biological 
replicates) such as the one shown in iand k. b, d, f, h, j and 1, Data 

are mean + s.d. n= 3 biological replicates. *P < 0.05; **P < 0.01; 

*** P< 0.001; NS, not significant (two-way ANOVA). 
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Extended Data Figure 8 | Starvation inhibits TORCI signalling, 
induces mammalian RACs and increases proteasome abundance. 
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using the fluorogenic substrate Suc-LLVY-AMC or by immunoblots. 
d, Quantification of the 26S proteasome activity (RPCP and RP2CP) 
of experiments such as the one shown inc. b, d, Data are mean + s.d. 


a, b, Immunoblots (a) and quantification (b) of the indicated proteins in 
lysates of HeLa cells after EBSS (Earle’s balanced salt solution) treatment 
for the indicated time. c, HeLa cell extracts following EBSS treatment for 
the indicated time were resolved on native PAGE (4.2%) and monitored 


n= 3 biological replicates. *P < 0.05; **P < 0.01; ***P < 0.001; NS, not 
significant (one-way ANOVA). Representative results of at least three 
independent experiments (biological replicates) are shown. 
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Extended Data Figure 9 | TORC1 activation by nutrient replenishment substrate Suc-LLVY-AMC or by immunoblots. d, Quantification of the 
decreases the abundance of RACs as well as 26S proteasome. 26S proteasome activity (RPCP and RP2CP) of experiments such as the 
a, b, Immunoblots (a) and quantification (b) of the indicated proteins one shown in c. b, d, Data are mean + s.d. n = 3 biological replicates. 
in lysates of HeLa cells after replenishment with rich complete medium *P<0.05; **P< 0.01; ***P < 0.001; NS, not significant (one-way 
for the indicated time. c, Native PAGE (4.2%) of cell extracts from HeLa ANOVA). Representative results of at least three independent experiments 
cells following media replenishment as in a, monitored by the fluorogenic (biological replicates) are shown. 
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treated with 250 nM Torin-1 or 200 nM rapamycin for the indicated time. are shown. 
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Heating of Jupiter’s upper atmosphere above the 


Great Red Spot 


J. O’Donoghuel, L. Moorel, T. S. Stallard? & H. Melin? 


The temperatures of giant-planet upper atmospheres at mid- to 
low latitudes are measured to be hundreds of degrees warmer than 
simulations based on solar heating alone can explain’*. Modelling 
studies that focus on additional sources of heating have been unable 
to resolve this major discrepancy. Equatorward transport of energy 
from the hot auroral regions was expected to heat the low latitudes, 
but models have demonstrated that auroral energy is trapped at 
high latitudes, a consequence of the strong Coriolis forces on rapidly 
rotating planets*-*. Wave heating, driven from below, represents 
another potential source of upper-atmospheric heating, though 
initial calculations have proven inconclusive for Jupiter, largely 
owing to a lack of observational constraints on wave parameters”. 
Here we report that the upper atmosphere above Jupiter’s Great Red 
Spot—the largest storm in the Solar System—is hundreds of degrees 
hotter than anywhere else on the planet. This hotspot, by process 
of elimination, must be heated from below, and this detection is 
therefore strong evidence for coupling between Jupiter’s lower and 
upper atmospheres, probably the result of upwardly propagating 
acoustic or gravity waves. 

On 4 December 2012 (Coordinated Universal Time, UTC) we 
observed Jupiter for 9h using the SpeX spectrometer® on the NASA 
Infrared Telescope Facility. The spectrometer slit was aligned along the 
rotational axis in the north-south direction at local noon on the planet. 
This arrangement is illustrated in Fig. 1a, which contains a slit-jaw 
image showing bright auroral emissions at the poles as well as a local- 
ized Great Red Spot (GRS) emission enhancement at mid-latitudes. 
Exposures from the instrument in this setup give wavelength and 
intensity information as a function of latitude as shown in Fig. 1b. By 
exposing continuously throughout the night, we obtained longitudinal 
information for most of the planet (a Jovian day is 9h 56 min long). 


Planetocentric latitude (°) 


3.38 


Figure 1 | The acquisition of Jovian spectra. a, Jupiter as observed by the 
SpeX slit-jaw imager and L-filter (3.13-3.53 j1m), on 4 December 2012. 
Bright regions at the poles result from auroral emissions; the contrast at low 
and mid-latitudes has been enhanced for visibility. The vertical beige line in 


The spectrum in Fig. 1b shows strong emission features at six wave- 
lengths, which appear prominently in the auroral regions and wane 
towards the equator. These are discrete ro-vibrational emission lines 
from H3*, a major ion in Jupiter’s ionosphere, the charged (plasma) 
component of the upper atmosphere. The colour contours highlight 
the weaker emissions from this ion across the body of the planet. Far 
from a uniform intensity at low latitudes, there is a substantial inten- 
sity enhancement in all of the emission lines within the —13° to —27° 
planetocentric latitude range occupied by the GRS”. As seen in the 
coloured contours of Fig. 1b, the H3* emissions are isolated in wave- 
length, indicating that there is no continuum reflection of sunlight at 
the latitudes of the GRS. 

The ratio between two or more emission lines can be used to derive 
the temperature of the emitting ions!°"!. With the observing geometry 
used here, such temperatures are altitudinally averaged ‘column 
temperatures’ of H3*, where the majority of H3* at Jupiter has been 
observed to be located at altitudes between 600 km and 1,000 km 
above the 1-bar pressure level'*. H;* has been demonstrated to be 
in quasi-local thermodynamic equilibrium throughout the majority 
of Jupiter's upper atmosphere, meaning that derived temperatures 
are representative of the co-located ionosphere and (the mostly H») 
thermosphere’’. In the Methods section we detail the data reduction 
techniques and temperature model fitting procedures, and in Fig. 2 we 
show two example model fits; only the strongest, outermost lines are 
used to fit temperatures, because the central H;* lines are contaminated 
by telluric absorption. Note that, even though the H;* peak intensities 
at the GRS (Fig. 2a) are lower than those at 45° latitude, this is a result 
of lower column-integrated H;* densities at lower latitude. Derived 
temperatures remain unaffected by the density differences because they 
are based entirely on H3* line ratios. 
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the middle of the image indicates the position of the spectrometer slit, which 
was aligned along the rotational axis. b, The co-added spectrum of seven 
GRS-containing exposures; dotted horizontal lines indicate the latitudinal 
range of the GRS. Further details are given in the Methods section. 
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Figure 2 | Model fit to observed H;* intensity as a function of wavelength. 
a, The data in Fig. 1b plotted between —13° and —19° planetocentric 
latitude; b, as for a but plotted between —40° and —49° planetocentric 
latitude. The H;* model fit to the data is shown as a solid red line: only 
the H;* lines at 3.383 jum and 3.454 1m are included in the temperature 


The difficulty in explaining the observed upper-atmospheric tem- 
peratures of the giant planets was realized more than 40 years ago’, and 
has since been termed the giant-planet “energy crisis”**. For Jupiter, 
only the observed temperatures within the auroral regions have been 
adequately explained, as the 1,000-1,400K temperatures!* observed 
there result from auroral heating mechanisms that impart 200 GW 
of power per hemisphere through ion-neutral collisions and Joule 
heating!*!®. The low to mid-latitudes do not have such a heat source, 
and yet are measured to be near 800K, which is 600 K warmer than 
can be accounted for by solar heating'*!”'®. If heating does not come 
from above (solar heating), and cannot be produced in situ via mag- 
netospheric interactions, then a solution is likely to be found below. 

Gravity waves, generated in the lower atmosphere and breaking 
in the thermosphere, represent a potentially viable source of upper- 
atmospheric heating. Previous modelling studies, however, have led 
to inconclusive results for Jupiter: while viscous dissipation of gravity 
waves in Jupiter’s upper atmosphere can lead to warming of the order of 
10K, sensible heat flux divergence can also lead to cooling bya similar 
amount, depending on the properties of the wave®”. Recent re-analysis 
of Galileo Probe data has shown that gravity waves impart a negligible 
amount of heating vertically to the stratosphere (gravity-wave motion 
is primarily longitudinal and latitudinal) and that heating near the 
thermosphere is less than 1 K per Jovian day”. 

A more likely energy source is acoustic waves that heat from below 
(also via viscous dissipation); this form of heating requires vertical 
propagation of disturbances in the low-altitude atmosphere. Acoustic 
waves are produced above thunderstorms, and the subsequent waves 
have been modelled to heat the Jovian upper atmosphere by 10K per 
day” and on Earth have been observed to heat the thermosphere over 
the Andes mountains??!. On Jupiter, acoustic-wave heating has been 
modelled to potentially impart hundreds of degrees of heating to the 
upper atmosphere”. However, to the best of our knowledge, no such 
coupling between the lower and upper atmosphere has been directly 
observed for the outer planets, so vertical coupling has not been seri- 
ously considered as a solution to the giant-planet energy crisis. 

Jupiter’s GRS is the largest storm in the Solar System, spanning 
22,000km by 12,000 km in longitude and latitude, respectively. The 
GRS lies within the troposphere, with cloud tops reaching altitudes 
of 50km, around 800 km below the Hs" layer’. In Fig. 3 we show (red 
circles) that the pattern of H3* intensity seen above the GRS, when 
fitted to our model, gives column-averaged H3;* temperatures of over 
1,600 K, higher than anywhere else on the planet, even in the auroral 
region. We also fitted temperatures to a swath of longitudes away from 
the GRS in order to illustrate that the enhancement in temperature 
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derivation (see Methods for the full list). Telluric absorption, normalized 
to show sky contamination, is shown in grey. The derived temperatures are 
1,644 + 161 K (a) and 900 + 42 K (b) (standard errors of the mean). The 
H3* model is extended to the central region (dotted red line) based on the 
temperatures and densities of the fits. Intensity errors are lo. 


occurs only within this longitude band. The latitudinal variation of 
temperatures away from the GRS is similar to the ranges previously 
observed”, indicating that the high temperature above the GRS is local- 
ized in both latitude and longitude. 

The high temperature in the northern part of the GRS provides direct 
observational evidence of a localized heating process. We interpret 
the cause of this heating to be storm-enhanced atmospheric turbu- 
lence, which arises due to the flow shear between the storm and the 
surrounding atmosphere. Some of these waves must then propagate 
vertically upwards, depositing their energy as heat through viscous 
dissipation. It is unknown, at present, why the two red data points at 
GRS latitudes (grey shaded region in Fig. 3) differ by 800 K. Perhaps 
there may be contamination of the H;* line at 3.454 jum by the methane 
emission line at the same wavelength. Any additional intensity added 
to this H3* line results in a lower temperature (for further detail see 
the Methods section). Thus, the temperature above the southern part 
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Figure 3 | Jovian H3* temperatures versus planetocentric latitude. 
Column-averaged temperatures of H;* shown here are each derived from 
model fits to the discrete H;* emission lines as shown in Fig. 2. Red circle 
symbols correspond to the co-addition of GRS-related spectra (that is, 
from the spectral image in Fig. 1b) between 239° and 253° in Jovian system 
III Central Meridian Longitude (CML). The GRS latitudes are indicated by 
the grey shading. Blue triangle symbols were derived from exposures taken 
in the ranges 293°-359° and 0°-82° CML, that is, longitudes well separated 
from the GRS, representing the ‘ordinary’ background conditions based on 
solar heating alone. The modelled temperature of the upper atmosphere 
for these non-auroral regions is 203 K (ref. 1). Uncertainties are standard 
errors of the mean. 
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of the GRS may be much higher than derived, but only if methane is 
preferentially brighter in the south. However, as the H3* and CH, lines 
at 3.454 1m are not separated spectrally in this work, it is not possible 
to conclude whether or not contamination is present. 

An alternative physical explanation may relate to the relative velocities 
between the zonal wind and the GRS being greatest on the equator- 
ward side of the storm: relative velocities are 75 ms! in the north, 
15ms_'in the storm core, and 25ms_! at the poleward edge’. The 
largest relative velocities would induce the strongest flow shear, lead- 
ing to the greatest turbulence and therefore the largest contribution 
to heating above. It is possible that evidence of such energy transfer 
from the lower to the upper atmosphere would be deposited en route 
in the intervening troposphere and upper stratosphere (0-150 km 
altitude), as there is a temperature enhancement of 10K encircling 
the GRS at these altitudes”*”*, However, this enhancement could also 
be due to the upwelling of material in the centre of the GRS, followed 
by increased adiabatic heating when the material downwells around 
the edges”. 

The only previous map of Jovian H;* temperatures that contains the 
GRS was made using ground-based data obtained in 1993 (ref. 17). The 
authors of ref. 17 did not mention the GRS, as no obvious signature 
was present in their temperature map. However, on the basis of their 
temperature contours and the expected location of the GRS at the time, 
we estimate that there was a measured temperature enhancement of 
50 K above the GRS. Such a minor temperature increase may indicate 
that the GRS-driven heating of Jupiter’s upper atmosphere is transient, 
but the spatial resolution of the 1993 observations was 9,800 km per 
pixel (at the equator), compared with 500 km per pixel in this study. 
Therefore, the previous data had much cruder resolution in latitude 
and longitude, and any localized temperature enhancements would 
have been smoothed out. 

In this work, the high-temperature region above the GRS is local- 
ized in latitude and longitude, indicating a large temperature gradient 
and perhaps a confinement by currently unknown upper-atmospheric 
dynamics. If wave heating driven from below is responsible for the 
temperatures observed in Jupiter's non-auroral upper atmosphere, then 
we might expect a relatively smooth temperature profile with latitude, 
punctuated by temperature enhancements above active storms. The 
GRS may then simply be the ‘smoking gun’ that dramatically illustrates 
this atmospheric coupling process, and provides the clue to solving the 
giant-planet energy crisis. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Additional observing details. In Fig. 1, where we show the acquisition of Jovian 
spectra, Jupiter’s sub-Earth latitude was +-3°. The configuration of the SpeX instru- 
ment on the Infrared Telescope Facility was single order with a long slit, at a spec- 
tral resolution of R= 2,500. The slit length and width used were 60 arcsec and 
0.3 arcsec, respectively, and one pixel subtended 0.15 arcsec on the sky. In Fig. 2 
the model telluric transmission spectrum is obtained from the Atmospheric 
TRANsmission database (ATRAN; https://atran.sofia.usra.edu) for a spectral 
resolution of R= 2,500. The absorption wells near H3* lines in the centre of 
the spectrum in Fig. 2 serve to highlight our reasons for avoiding that region in 
the temperature fitting. The attenuation of the signal in this figure by the sky is 
constant as a function of latitude because all of the temperature fits are from the 
same exposure, so any attenuation would affect each temperature as a function of 
latitude in the same way. 

Absolute calibration. We flux calibrated the data by using the photometric-standard 
AOV star HR1019 in the usual manner: that is, by assuming a blackbody curve for the 
temperature of the star (10,000 K in this case) and comparing it to what we observed. 
This serves a dual purpose in that by dividing the data by the flux calibration, 
it converts counts into physical units of flux and also yields a profile of what the 
sky has absorbed. The mean uncertainty in the absolute calibration as a function 
of wavelength is 4% of the flux, and the signal-to-noise ratio for the star was 24. 
Instrumental effects. These are accounted for by flat fielding, dark-current 
subtraction and hot pixel removal in every frame. The calibrated Jovian spectra 
(containing uncertainties in absolute calibration above) also include noise from the 
instrumentation and Earth’s atmospheric attenuation. The uncertainties are thus 
found by finding the standard deviation of the backgrounds in the final spectrum. 
All errors are propagated through with the absolute calibration and uncertainty 
to produce the error bars in intensity displayed in Fig. 2 and the temperature 
estimates in Fig. 3. 

H; fitting. To find the temperatures from Fig. 1b, we used a spectroscopic H3* 
line list?> and the most recent H3* partition function coefficients”®. The spectrum 
of H;* can be treated as a sum of Gaussian distribution curves, with each curve a 
function of temperature. This ‘equation of a spectrum? is solved in order to derive 
the temperature’. This technique has been used to derive H;* temperatures on 
Jupiter, Saturn and Uranus for decades”s, with typical uncertainties of 10%. The 
fitting routines used are the same as those in previous literature’, and include a 
list of over three million ro-vibrational transition lines of H3* (ref. 25). The fitting 
routine uses the most recent partition function constants to establish a tempera- 
ture; these constants are applicable for temperatures between 100K and 10,000K 
(whereupon the ion dissociates)”°. 
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Handling of non-H;* intensity. We now address the possibility of attenuation of 
H;* by other sources at Jupiter. Possibility 1 is that there is enhanced reflection 
of sunlight from haze at the location of the GRS, but this is not seen adjacent in 
wavelength to any lines in Fig. 1 and can consequently be ruled out. Possibility 
2 pertains to emission from neutral gases. Only the two intensity peaks overlaid 
with solid red lines are included in the final fit, though the left peak contained 
the H; lines at 3.38285 1m and 3.38391 um, whereas the right peak line included 
3.45502 jum, 3.45483 jum and 3.45468 jum. Methane (CH,), the dominant hydrocarbon 
in Jupiter’s atmosphere, is known to emit at a number of wavelengths in this 
region, namely 3.380 1m, 3.392 jim, 3.404 um, 3.415 um, 3.440 1m and 3.454 um. 
Some of these are visible in Fig. 1 (for example, 3.404 |1m) and some are not (for 
example, 3.380,1m), but we are mainly interested in any that could affect the 
fitted H3*, which means ignoring, for now, the central portion of Fig. 2. The CHy 
emission line at 3.454 1m is the only line that could possibly fall on a fitted H3* 
line, and the effect of it doing so would mean that the line ratio between the H;* 
lines denoted by the solid red fit would be larger. For this particular set of lines, 
if the ratio is increased, then the temperature estimate decreases: this can be seen 
by comparing the ratios of lines in Fig. 2, with the lower-ratio GRS spectrum 
corresponding to 1,644K + 161K, while the higher-ratio non-GRS spectrum is 
fitted as 900 + 42 K (standard errors of the mean). In other words, if methane was 
contributing emission to this line, then accounting for it in some way by removing 
an arbitrary amount would result in the GRS temperature fitted being even higher 
than the 1,600 K derived here. 

Code availability. The H;* spectroscopic line list used in the model is available 
online at http://www.exomol.com/data/molecules. In addition, an online H3> 
intensity calculator is available at http://h3plus.uiuc.edu. The model-fitting 
routines and reduction code used in this work are available on request from J.O’D. 
(jameso@bu.edu). Our data reduction pipeline makes substantial use of the NASA 
Astronomy IDL library, available online at http://idlastro.gsfc.nasa.gov. 
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A photon-photon quantum gate based on a single 
atom in an optical resonator 


Bastian Hacker!*, Stephan Welte!, Gerhard Rempe! & Stephan Ritter! 


That two photons pass each other undisturbed in free space is 
ideal for the faithful transmission of information, but prohibits an 
interaction between the photons. Such an interaction is, however, 
required for a plethora of applications in optical quantum 
information processing'. The long-standing challenge here is to 
realize a deterministic photon-photon gate, that is, a mutually 
controlled logic operation on the quantum states of the photons. 
This requires an interaction so strong that each of the two photons 
can shift the other’s phase by 7 radians. For polarization qubits, this 
amounts to the conditional flipping of one photon’s polarization to 
an orthogonal state. So far, only probabilistic gates” based on linear 
optics and photon detectors have been realized*, because “no known 
or foreseen material has an optical nonlinearity strong enough to 
implement this conditional phase shift”*. Meanwhile, tremendous 
progress in the development of quantum-nonlinear systems has 
opened up new possibilities for single-photon experiments’. 
Platforms range from Rydberg blockade in atomic ensembles® to 
single-atom cavity quantum electrodynamics’. Applications such as 
single-photon switches® and transistors”, two-photon gateways", 
nondestructive photon detectors’”, photon routers’? and nonlinear 
phase shifters!+"!* have been demonstrated, but none of them with 
the ideal information carriers: optical qubits in discriminable 
modes. Here we use the strong light-matter coupling provided by a 
single atom in a high-finesse optical resonator to realize the Duan- 
Kimble protocol’? of a universal controlled phase flip (7 phase shift) 
photon-photon quantum gate. We achieve an average gate fidelity 
of (76.2 + 3.6) per cent and specifically demonstrate the capability 
of conditional polarization flipping as well as entanglement 
generation between independent input photons. This photon- 
photon quantum gate is a universal quantum logic element, and 
therefore could perform most existing two-photon operations. The 
demonstrated feasibility of deterministic protocols for the optical 
processing of quantum information could lead to new applications 
in which photons are essential, especially long-distance quantum 
communication and scalable quantum computing. 

Perhaps the simplest way to realize a photonic two-qubit gate is to 
overlap two photons in a nonlinear medium. However, it has been 
argued that this cannot ensure full mutual information transfer 
between the qubits for reasons of locality and causality”®”'. Instead, a 
viable strategy is to keep the two photons separate, change the nonlinear 
medium using the first photon, use this change to affect the second 
photon, and, finally, make the first photon interact with the medium 
again to ensure gate reciprocity. These three sequential interactions 
enable full mutual information exchange between the two qubits, as 
is required for a gate, even though the photons never meet directly. 

Our experimental realization of a controlled phase flip (CPF) photon- 
photon gate builds on the proposal by Duan and Kimble!’. The medium 
is a single atom strongly coupled to a cavity and the interactions hap- 
pen upon reflection of each photon off the atom-cavity system”. The 
proposal of ref. 19 considers three reflections, but here we replace the 
second reflection of the first photon by a measurement of the atomic 


state and classical phase feedback on the first photon (analogous to a 
proposal”? in which the roles of light and matter are interchanged). 
In practice, this allows us to achieve better fidelities, higher efficien- 
cies and to use a simpler setup compared to that of the proposed 
scheme”. 

We employ a single ®’Rb atom trapped in a three-dimensional optical 
lattice** at the centre of a one-sided optical high-finesse cavity’? (Fig. 1). 
The measured cavity quantum electrodynamics parameters for the 
relevant transition ||) =|F=2, mp=2) < |e) =|F=3, mp=3) of the 
Dy line are (g, k, Y) =27(7, 2.5, 3) MHz. Here, F and m rare the quantum 
numbers describing the total atomic angular momentum and its projec- 
tion onto the quantization axis, respectively, g denotes the atom—cavity 
coupling constant, and « and yare the decay rates of the cavity field and 
the atomic dipole, respectively. The atom takes on the role of an ancilla 
qubit, implemented in the basis | |) =|F=1, mp=1) and |f), with the 
quantization axis along the cavity axis. Both photonic qubits are indi- 
vidually encoded in the polarization using the notation |L) and |R) for 
a left- and a right-handed photon, respectively. They are consecutively 
coupled into the cavity beam path via a non-polarizing beam splitter 
(98.5% transmission), which takes on the role of a polarization- 
independent circulator. The photons as well as the empty cavity are on 
resonance with the transition ||) < |e) at 780 nm. Only the atom in |1) 
and the photon in |R) are strongly coupled, because the | |) <> |e) transi- 
tion is detuned by the ground-state hyperfine splitting of 6.8 GHz, and 
the left-circularly polarized transition |) < |F =3, mp=1) is shifted 
out of resonance by a dynamical Stark shift induced by the laser that 
traps the atom. The strong light-matter coupling between |{) and |R) 
shifts the phase of a reflected photon by x compared to the cases where 
the atom occupies ||) or the photon is in |L). Thus, each reflection 
constitutes a bidirectional controlled-Z interaction”” between the 
atomic and photonic qubit (red boxes in Fig. 2a). 

Figure 2a depicts the experimental implementation of the photon- 
photon gate as a quantum circuit diagram. In short, the protocol starts 
with arbitrary photonic input qubits |p)) and |p>) and with the atom opti- 
cally pumped to |{). After this initialization, two consecutive atomic- 
qubit rotations combined with controlled-Z atom—photon quantum 
gates are performed. The purpose of the rotations is to maximize the 
effect of the subsequent gates. Note that up to this point the first pho- 
ton has the capability to act via the atom onto the second photon. To 
implement a back-action of the second photon onto the first one, the 
protocol ends with a measurement of the atomic qubit and feedback 
onto the first photon. This measurement has the additional advantage 
that it removes any possible entanglement of the atom with the photons, 
as required for an ancillary qubit. A longer and detailed stepwise anal- 
ysis of the above protocol as well as the characterization of the Raman 
lasers used for the implementation of the atomic-state rotations can be 
found in the Methods. 

To apply this scheme in practice, the qubits have to be stored and 
controlled in an appropriately timed sequence, as follows. After the first 
photon p is reflected, it directly enters a 1.2-km-long delay fibre. The 
delay time of 6 1s is sufficient to allow for reflection of both photons 
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Figure 1 | Schematic of our setup. Qubit-carrying weak coherent photon 
pulses p; and p2 enter in two separate spatio-temporal modes via a non- 
polarizing 98.5% transmitting beam splitter (NPBS) that acts effectively 
as a circulator. The photons are subsequently reflected from the cavity 
containing a single atom before a switch directs them into a delay fibre. 
While p; and p; are stored in the fibre, the state of the atom is read out 
via fluorescence photons (blue arrows) that the switch directs towards a 
single-photon detector (SPD). A field programmable gate array (FPGA) 
applies a conditional phase feedback to p, via an electro-optical modulator 
(EOM). Eventually, the photons leave the gate setup towards polarization 
analysers. The inset shows the atomic energy level scheme. The three 
depicted, relevant levels of 8’Rb and the photon polarizations are defined 
in the main text. The photons and the empty cavity are on resonance with 
the atomic transition ||) < |e). 


from the cavity, two coherent spin rotations, and state detection on 
the atom (Fig. 2b). The two photon wave packets are in independent 
spatio-temporal modes, which can in principle be arbitrarily shaped. 
The only requirement is that the frequency spectrum should fall within 
the acceptance bandwidth of the cavity (0.7 MHz for +0.171 phase shift 
accuracy). We used Gaussian-like envelopes of 0.6 1s full-width at 
half-maximum (FWHM) within individual time windows of width 
1.3 ps, such that the corresponding FWHM bandwidth of 0.7 MHz 
leads to an acceptable phase-shift spread. 

After the last spin rotation, Purcell-enhanced fluorescence state 
detection of the atomic qubit is performed. This is achieved within 
1.2 1s with a laser beam resonant with the |}) < |e) transition and 
impinging perpendicular to the cavity axis (blue beam in Fig. 1). 
This yields zero fluorescence photons for ||) and a near-Poissonian- 
distributed photon number with an average of 4 for ||), resulting in a 
discrimination fidelity of 96%. The fluorescence light shares the same 
spatial mode as the gate photons and needs to be detected before the 
first photon leaves the delay fibre. Separation of the fluorescence light 
from the qubit photons is achieved with an efficient free-space acousto- 
optical deflector (labelled ‘Switch in Fig. 1). Qubit photons pass the 
deactivated acousto-optical deflector straight towards the delay fibre, 
whereas state-detection photons are deflected into the first diffraction 
order directed at a single-photon detector. The corresponding detection 
events are evaluated in real time by a field programmable gate array, 
which activates a 7 phase shift on the [R) component of the first gate 
photon if the atom was detected in |{). No phase shift is applied if the 
atom was found in ||). This conditional phase shift is performed by 
an electro-optical modulator with a switching time of 0.1 1s, which 
is ready when p, leaves the delay fibre and is reset before p2 appears 
at the end of the fibre. The experiment runs at a rate of 500 Hz, with 
each execution preceded by atom cooling, atomic state preparation via 
optical pumping and probing of cavity transmission to confirm success 
of the initialization. All experiments with one detected qubit photon in 
each of the two temporal output modes are evaluated without further 
post-selection. 

If both input photons are circularly polarized, the photon-photon 
gate appears as a CPF gate (see Methods) characterized by: 


|RR)—|RR) |LR)— —|LR) 


|RL)—>|RL) |LL)—|LL) 
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Figure 2 | The photon-photon gate mechanism. a, Quantum circuit 
diagram. The sequence of controlled-Z gates between the atomic ancilla 
qubit and the gate photons interleaved with rotations on the atomic qubit 
acts as a pure CPF gate on the input photon state |p,p2). Note that the 
dashed box is equivalent to the reflection-based quantum controlled-Z 
gate of the original proposal via the principles of deferred and implicit 
measurement. b, Pulse sequence showing the timing of the experimental 
steps of the gate protocol. A delay fibre of length 1.2 km is used to store the 
gate photons for 61s. 


As with any quantum gate, it can also be expressed in other bases. 
We define the linear polarization bases as |H) = se (IR)+{L)) : 


[V) = 7 (IR)-IL))> |D) = 35 (IR)+iL)). and |A) = > (R)+IL)), 


respectively. With one of the photons being circularly and the other one 
linearly polarized, the gate will act as a controlled-NOT gate with the 
circular qubit being the control and the linear one being the target 
qubit. When both photons enter in linear polarization states, the gate 
will turn the two separable inputs into a maximally entangled state. 

We characterized the gate by applying it to various pairs of separable 
input-qubit combinations and by measuring the average outcome from 
a large set of repeated trials. The input consisted of two independent 
weak coherent pulses each impinging with an average photon number 
of #=0.17 onto the cavity. The choice of 77 is a compromise between 
measurement time and measured gate fidelity. While lowering 
7 reduces the data rate because of the high probability of zero-photon 
events in either of the two photon modes, increasing 7 raises the multi- 
photon probability per pulse, thereby deteriorating the measured gate 
fidelity. 

First, we processed the four different input states of a controlled- 
NOT basis, that is, all combinations of photon p, in the circular basis 
and p> in a linear basis, and analysed them in the corresponding 
measurement bases. The resulting truth table is depicted in Fig. 3 and 
shows an overlap with the case of an ideal controlled-NOT gate of 
Fonot = (76.9 + 1.5)%. 

A decisive property of a quantum gate that distinguishes it from its 
classical counterpart is its capability to generate entanglement. For both 
input photons in the linear polarization state |D), the gate ideally 


(\DL)-+|AR)). We 


creates the maximally entangled Bell state |w+) = = 


reconstructed the output of the gate for the input state |DD) 
from 1,378 detected photon pairs via linear inversion and obtai- 
ned the density matrix p depicted in Fig. 4. It has a fidelity 
Fy+ = (W*|p|W*) =(72.9 £2.8)% with the ideal Bell state (unbiased 
linear estimate). The generation of this entangled state from a separable 
input state directly sets a non-tight bound for the entangling capability 
(smallest eigenvalue of the partially transposed density matrix)” of our 
gate, C< —0.242 + 0.028, which is —0.5 for the ideal CPF gate and 
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Figure 3 | Truth table of the controlled-NOT photon-photon gate. 
The gate flips the linear polarization of the target photon p, if the control 
photon p; is in the state |L), but it leaves the target qubit unchanged if 
the control photon is in |R). The vertical axis gives the probability of 
measuring a certain output state given the designated input state. The 
truth table for an ideal controlled-NOT gate is indicated by the four 
transparent bars with P = 1. The black T-shaped bars represent statistical 
errors (standard error of the mean) on each entry (root mean square 
2.2%), computed via linear error propagation assuming independent 
photon statistics. 


where a negative C value denotes that the gate is entangling. We remark 
that the total data set can be separated into two subsets of equal 
size corresponding to the outcome of the atomic state detection 


being ||) or |?). The respective fidelities are F : .=(74.443.9)% and 


F t , =(71.5+4.2)%, that is, the gate works comparably well in both 
cases. 

As an overall measure of the gate performance we determined the 
average gate fidelity F, which is equal to the average fidelity of 6 x 6 
output states generated from the input states on all canonical polarization 
axes (H, V, D, A, R, L) with the theoretically expected ideal outcomes”®. 
All 36 state fidelities were estimated linearly and bias-free with rand- 
omized tomographically complete basis settings. Although we collected 
only insignificant statistics of 80 detected photon pairs on each of the 
output states, their combination gives a meaningful measure of 
F =(76.2 +3.6)%. The deviation from unity is well understood for our 
system and results from technical imperfections, which we discuss below. 

The efficiency of the presented gate, which is the combined trans- 
mission probability for two photons, is unity for the ideal scheme, but 
gets reduced by several experimental imperfections. It is polarization- 
independent because all optical elements, including the cavity, have 
near-equal losses for all polarizations. The two main loss channels are 
the long delay fibre (transmission T= 40.4%) and the limited cavity 
reflectivity (R =67%). The latter results from the cavity not being per- 
fectly single-sided and having a finite cooperativity of C=3.3. All other 
optical elements have a combined transmission of 81%, dominated by 
the fibre-coupling efficiency and absorption of the acousto-optical 
deflector switch. This yields a total experimental gate efficiency of 
(22%)? = 4.8%. Despite the transmission losses, characteristic for all 
photonic devices, the protocol itself is deterministic. The largest poten- 
tial improvement is offered by eliminating the fibre-induced losses, for 
instance by a free-space delay line, a delay cavity or an efficient optical 
quantum memory. 

We have modelled all known sources of error (see Methods) to repro- 
duce the deviation of the experimental gate fidelity from unity. Here 
we quote the reductions in fidelity that each individual effect would 
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Figure 4 | Reconstructed density matrix of the entangled two-photon 
state created by the gate from the separable input state |DD). 

a, b, Depicted are the real (a) and imaginary (b) parts of the elements of 
the density matrix. The transparent bars indicate the ideal density matrix 
for |W*) in the chosen basis. Statistical errors (standard error of the mean) 
on each entry (root mean square 2.4%) are drawn as black T-shaped bars. 


introduce to an otherwise perfect gate. The largest contribution stems 
from using weak coherent pulses to characterize the gate and is there- 
fore not intrinsic to the performance of the gate itself. First, there is a 
considerable probability of having two photons in one qubit mode if 
it is populated, resulting in a phase flip of 27 instead of 1, causing an 
overall reduction of the gate fidelity by 12%. Second, the probability of 
having both qubit modes populated is small, such that detector dark 
counts contribute a 2% error. The measured gate fidelity could therefore 
be greatly improved by employing a true single-photon source’. 

The relatively short delay introduced by the optical fibre restricts the 
temporal windows for the photon pulses and atomic state detection. 
The resulting bandwidth of the photons reduces the gate fidelity by 
6%. The obvious solution is to choose a longer delay. Further errors 
can be attributed to the characteristics of the optical cavity (5%), the 
state of the atom (6%), and other optical elements (2%). The cavity has 
a polarization-eigenmode splitting of 420 kHz that could be eliminated 
by mirror selection”’. Neither the resonance frequency of the cavity nor 
the spatial overlap between its mode and the fibre mode are perfectly 
controlled (see Methods). The latter could be improved with additional 
or better optical elements. Fidelity reductions associated with the state 
of the atom are due to imperfect state preparation, manipulation and 
detection, and decoherence. Improvements are expected from the 
application of cavity-enhanced state detection to herald successful state 
preparation, Raman sideband cooling to eliminate variations in the 
Stark shift of the atom, and composite pulses to optimize the state rota- 
tions. The limited precision of polarization settings and polarization 
drifts inside the delay fibre are the main contribution from other optical 
elements. The latter could be improved using active stabilization. 
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The wealth of realistic suggestions for improvement given above shows 
that progress towards even higher fidelities is certainly feasible for the 
gate implementation presented here. 

The photon-photon gate as demonstrated here follows a determin- 
istic protocol and could therefore be a scalable building block for new 
photon-processing tasks such as those required by quantum repeaters’, 
for the generation of photonic cluster states”? or quantum computers*”, 
The gate’s ability to entangle independent photons could be a resource 
for quantum communication. Moreover, our gate could serve as the 
central processing unit of an all-optical quantum computer, envisioned 
to process pairs of photonic qubits that are individually stored in and 
retrieved from a quantum cache that may in principle be arbitrarily 
large. Such a cache would consist of an addressable array of quan- 
tum memories, individually connected to the gate via optical fibres. 
Eventually, such architecture might even be implemented with photonic 
waveguides on a chip. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Composition of the photon-photon CPF gate. The action of the quantum circuit 
diagram depicted in Fig. 2a can be computed in the eight-dimensional Hilbert 
space spanned by the atomic ancilla qubit and the two photonic qubits. The atomic 


single-qubit rotations by x/2 and —1/2 are described by the operators a & ry 


and oa ( : - respectively, in the basis {|{), | |)}. The atom-photon controlled-Z 
gate is described by U,,= diag(—1, 1, 1, 1) in the basis {|R), |TL), |[R) || L)}. As 
indicated in Fig. 2a, the atom is initially prepared in |{). Any input state of the two 
photonic qubits, including entangled states, can be written as 


\P\P>) _ crr|RR) + cri|RL) “ Cir|LR) + cit|LL) 


defined by the four complex numbers cr, Cri; CLR and cy. Henceforth, we will 
use the compact notation |rr):= crr|RR), |rl):=crL|RL), |lr):=crr|LR), and 
|ll):=c,|LL). Therefore, any photon-photon gate operation starts in the collective 
initial state: 


[T)(lrr) + |r) + |r) + |) 


The first 1/2 rotation brings the atom into a superposition: 


)+|L)) (er) + 


1 
allt rl) + |lr) + |Z) 


followed by a controlled-Z interaction between the atom and the first photon, 
which flips the sign of all states with the atom in |[) and the first photon in |R): 


1 
ay MIT) +e) + PA) HIT) + Le) + 1D) 


Subsequent rotation of the atom by —7/2 creates the state: 
[L) (ler) + |e) + |1) (er) + |) 


Reflection of the second photon flips the sign of all states with the atom in ||) and 
the second photon in |R): 


[L)(lrr) + |r) + 1) (— |r) + |) 


The final rotation of the atom by 1/2 yields: 


(C=|T) +E) Cer) + |) + 1T) +L) |) + DY) 


At this point the state of the atom is measured. There are two equally probable 
outcomes projecting the two-photon state accordingly: 
[t)s —[rr) — |r) — [p+ |) 


and 


[L): 


Following detection of the atom in ||), an additional a phase is imprinted on 
the |R)-part of the first photon, that is, a sign flip on |rr) and |rl), whereas the 
photonic state is left unaltered upon detection of ||). Thereby, the final photonic 
state becomes 


+ |rr) + |rl) — |Ir) + |I2) 


|rr) + |rl) — |Ir) + |Il) 


independent of the outcome of the atomic state detection. It differs from the input 
state by a minus sign on |/r) only. Hence, the total circuit acts as a pure photonic 
CPF gate: 


|RR)—>|RR) 
|RL)>|RL) 


|LR)— —|LR) 

|LL) +|LL) 

Calibration of atomic single-qubit rotations. To calibrate the relevant experimental 
parameters, we employ a Ramsey-like sequence of three subsequent rotation pulses. 


The pulses are exactly timed as in the gate sequence (see Fig. 2), but the two photon 
pulses interleaved between the Raman pulses are turned off. 
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Initially, the atom is prepared in |}). The Raman pair is red-detuned by 
131 GHz from the D, line of *’Rb. Employing an acousto-optic modulator, we 
scan one of the Raman lasers over 2.5 MHz while the frequency of the other is 
fixed. Thus, we effectively scan the two-photon detuning. Extended Data Fig. 1 
shows a spectrum depicting the population in ||) as a function of the two-photon 
detuning. Ideally, the gate experiments are performed on two-photon resonance. 
In this case, the second pulse compensates the first and the third one brings the 
atom into the superposition state (|?) + ||))//2, such that 50% population in |) 
are obtained. 

To determine the experimental parameters that guarantee this situation, a theo- 

retical model is fitted to the spectrum. It allows us to simultaneously access several 
mutually dependent fit parameters that are useful in calibrating the frequency as 
well as the intensity of our Raman beams. The fit reveals the Rabi frequency for the 
transition between ||) and |{), which we tune to 250 kHz to obtain 1/2 pulses in 
1s. The two-photon detuning is also extractable from the fit and we find a light 
shift of 40 kHz that is due to the Raman lasers. To compensate for it, we choose dif- 
ferent two-photon detunings when the pulses are on and off, such that two-photon 
resonance is guaranteed during the entire sequence. 
Transverse optical mode matching. Good overlap between the transverse mode 
profiles of the incoming wave packet and the optical cavity is essential for the per- 
formance of the gate. To achieve this, the qubit-carrying photon pulses are taken 
from a single-mode fibre with its mode matched to the cavity. In a characterization 
measurement we determined that 92% of probe light emanating from the cavity 
is coupled into this input fibre. Therefore, 8% of the impinging light may arrive 
in an orthogonal mode that does not interact with the atom-cavity system. Light 
in this mode reduces the fidelity of the gate if it is collected at the output. This 
problem is overcome because the delay fibre also acts as a filter for the transverse 
mode profile after the cavity. The mode overlap between cavity and delay fibre is 
84%, partially suffering from mode distortion by the acousto- optical deflector used 
for path switching. From an analysis of cavity reflection spectra we can estimate 
the amount of light that did not interact with the cavity but is still coupled from 
the input fibre into the delay fibre. It is below 1% of the gate output, such that the 
resulting reduction of the gate fidelity is also well below 1%. 

A small misalignment, for example, due to slow temperature drifts, reduces 

the positive filtering effect described above. Therefore, optimal mode matching is 
essential to maintain maximum gate fidelity. In the experiment, reflection spectra of 
the empty cavity were constantly monitored and, whenever necessary, data taking 
was interrupted to re-establish optimal mode overlap. 
Simulation of imperfections. To understand the imperfections encountered in 
the experiment, we have set up a model of both photonic qubits and the atomic 
ancilla qubit in terms of their three-particle density matrix p. Under ideal con- 
ditions, the density matrix transforms via sequential unitary transformations U 
as p > UpU', and known error sources can be introduced at each specific step. 
Finally, the fidelity of p with the desired target state is calculated for comparison 
with the experimental value. 

In this scenario, an unnoticed, incorrect preparation of the atom creates an 
incoherent admixture of the wrong initial state. Errors in the atomic state detection 
lead to an exchange of the photonic submatrices corresponding to each atomic 
state. Detector dark counts are modelled as an admixture of a fully mixed state and 
decoherence effects are taken into account as reductions in off-diagonal elements 
of p. Cases where photons do not enter the cavity because of geometric mode 
mismatch are included with a phase shift of zero, and the case of an undetected 
additional photon in one of the weak pulses is incorporated with a phase shift of 
2m, that is, twice the ideal value. Interestingly, most deteriorations of the atom- 
photon interaction, like fluctuations of the atomic, cavity and photon frequencies, 
all condense into a variation, Ay = £0.15, of the conditional phase shift. 
Considering this together with the polarization rotation R,(€) that a photon expe- 
riences owing to the residual cavity birefringence by an angle of €= 0.067 in the 
case of ||), the ideal atom-photon controlled-Z gate Uzy = diag(—1, 1, 1, 1) in the 
basis {|tR), |L), | R), |{L)} must be replaced by: 


eilmt+Ay) 0 0 
wpe 
R 
0 8 Re) 


Random fluctuations in some of the parameters enter our model by integrating the 
resulting density matrix over the assumed Gaussian distribution function. 
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Extended Data Figure 1 | Ramsey-like spectrum to calibrate the atomic 
state rotations. After initialization of the atom in ||), we perform the 
same sequence of three Raman pulses as in the gate protocol. The final 
population in |{) is determined as a function of the two-photon detuning 
of the employed Raman pair with respect to the frequency difference 
between the two atomic qubit states. The solid dots are measured data with 
statistical error bars (standard error of the mean). The solid line is the fit of 
a theoretical model based on the sequence of rotations. It yields results for 
the Rabi frequency of the atomic spin rotation, an offset of the two-photon 
detuning, as for example, induced by ambient magnetic fields, and the 
light shift imposed by the Raman laser pair, all with +3 kHz precision. 
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Single-layer MoS, nanopores as nanopower 


generators 


Jiandong Feng', Michael Graf!, Ke Liu!, Dmitry Ovchinnikov’, Dumitru Dumcenco*, Mohammad Heiranian*, Vishal Nandigana’*, 


Narayana R. Aluru%, Andras Kis? & Aleksandra Radenovic! 


Making use of the osmotic pressure difference between fresh 
water and seawater is an attractive, renewable and clean way 
to generate power and is known as ‘blue energy’!~?. Another 
electrokinetic phenomenon, called the streaming potential, 
occurs when an electrolyte is driven through narrow pores either 
by a pressure gradient* or by an osmotic potential resulting from 
a salt concentration gradient’. For this task, membranes made of 
two-dimensional materials are expected to be the most efficient, 
because water transport through a membrane scales inversely 
with membrane thickness*-’. Here we demonstrate the use of 
single-layer molybdenum disulfide (MoS) nanopores as osmotic 
nanopower generators. We observe a large, osmotically induced 
current produced from a salt gradient with an estimated power 
density of up to 10° watts per square metre—a current that can be 
attributed mainly to the atomically thin membrane of MoS,. Low 
power requirements for nanoelectronic and optoelectric devices 
can be provided by a neighbouring nanogenerator that harvests 
energy from the local environment®*"'—for example, a piezoelectric 
zinc oxide nanowire array’ or single-layer MoS, (ref. 12). We use 
our MoS, nanopore generator to power a MoS; transistor, thus 
demonstrating a self-powered nanosystem. 


Ag/AgCl 
electrode 


Motion of ions 
Concentration 


MoS, nanopores have already demonstrated better water-transport 
behaviour than graphene!“ owing to the enriched hydrophilic sur- 
face sites (provided by the molybdenum) that are produced following 
either irradiation with transmission electron microscopy (TEM) or 
electrochemical oxidation'®. The osmotic power is generated by sep- 
arating two reservoirs containing potassium chloride (KCl) solutions 
of different concentrations with a freestanding MoS, membrane, into 
which a single nanopore has been introduced!?. A chemical potential 
gradient arises at the interface of these two liquids at a nanopore ina 
0.65-nm-thick, single-layer MoS; membrane, and drives ions spon- 
taneously across the nanopore, forming an osmotic ion flux towards 
the equilibrium state (Fig. 1a). The presence of surface charges on the 
pore screens the passing ions according to their charge polarity, and 
thus results in a net measurable osmotic current, known as reverse 
electrodialysis’. This cation selectivity can be better understood by 
analysing the concentration of each ion type (potassium and chloride) 
as a function of the radial distance from the centre of the pore, as we 
show here through molecular-dynamics simulations (Fig. 1b). 

We fabricated MoS, nanopores either by TEM? (Fig. 1c) or by the 
recently demonstrated electrochemical reaction (ECR) technique!®. 
With a typical nanopore diameter in the range 2-25 nm, a stable 


Figure 1 | Harvesting osmotic energy with 
MoS, nanopores. a, The experimental set-up. 
Salt solutions with different concentrations are 
separated by a 0.65-nm-thick MoS; nanopore 
membrane. An ion flux driven by chemical 
potential through the pore is screened by the 
negatively charged pore, forming a diffusion 
current composed of mostly positively charged 
ions. b, Top panel, a typical simulation box used 
in molecular-dynamics simulations, showing the 
nanopore membrane (in blue and yellow) and the 
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salt (green and red) in solution. Bottom panel, 
molecular-dynamics-simulated potassium-ion 
and chloride-ion concentrations as a function of 


Concentration (M) 


5L 


Crnax! Cin = 100 
Conax!Cmmin = 100 
Conax!Cmmin = 500 
Crrax/Cinin = 500 
Conax!Cmmin = 1,000 
Conax!Cmmin = 1,000 


min 


min 


1 
0 2 4 6 8 10 


Distance from the centre of the pore (A) 


the radial distance from the centre of the pore. 
The region near the charged wall of the pore 

is representative of the electrical double layer. 
Cmax. Maximum concentration; Cyin, Minimum 
concentration. c, Example of a TEM-drilled MoS 
nanopore of diameter 5nm. 
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Figure 2 | Electrical conductance and chemical reactivity of the MoS, 
nanopore. a, Current-voltage response of MoS nanopores with different 
pore sizes (black, 2 nm; red, 6nm; blue, 25nm) in 1 M KCl at pH 5. 

b, Conductance as a function of salt concentration at pH 5. By fitting the 


osmotic current can be expected, owing to the long time required for 
the system to reach equilibrium. We measured the osmotic current 
and voltage across the pore by using a pair of Ag/AgCl electrodes to 
characterize the current-voltage (I-V) response of the nanopore. 

To gain a better insight into the performance of the MoS, nanopore 
power generator, we first characterized the ionic transport properties 
of MoS? nanopores under various ionic concentrations and pH con- 
ditions, which can provide information on the surface charge of the 
MoS, nanopore. Figure 2a shows the I-V characteristics of MoS, nano- 
pores of various diameters. A large pore conductance originates from 
the ultrathin nature of the membrane. The conductance also depends 
on the salt concentration (Fig. 2b) and shows saturation at low salt 
concentrations—a signature of the presence of surface charge on the 
nanopore’”!®, The predicted pore conductance (G), taking into account 
the contribution of the surface charge (5), is given by’: 


-1 
4L 1 2 


Tt 
1442 ad + (py 


(1) 


where kp is the bulk conductivity, L is the pore length, d is the pore 
diameter, [p, is the Dukhin length (which can be approximated by 
[ze where e is the elementary charge and c, is the salt concentration), 
2c 

a isa geometrical prefactor that depends on the model used (here, 
a=2)", and @ can also be approximated to be 2 to obtain the best fit- 
ting agreement’®. From the fitting results shown in Fig. 2b, we find a 
surface charge value of —0.024 Cm~’, —0.053 Cm and —0.088 Cm~ 
for pores of size 2nm, 6nm and 25nm, respectively, at pH 5. 
These values are comparable to those reported recently for graphene 
nanopores (—0.039C m~”)?° and nanotubes (—0.025C m~? to 
—0.125Cm~’)° at pH 5. The surface charge density can be further 
modulated by adjusting the pH to change the pore surface chemistry 
(Fig. 2c). The conductance increases with an increase in pH, suggesting 
the accumulation of more negative surface charges in MoS, nanopores. 
The simulated conductance from equation (1) at 10 mM KCl is linearly 
proportional to the surface charge values; thus, pH changes could sub- 
stantially improve the surface charge up to 0.3-0.8 Cm”. The chemi- 
cal reactivity of MoS, to pH is also supported by measurements of zeta 
potential on MoS, (ref. 21). However we also notice that, as with other 
nanofluidic systems””®, the surface charge density varies from pore to 
pore, which means that different pores can have disparate values of the 
equilibrium constant, owing to the various combinations of Mo and S$ 
atoms’ at the edge of the pores (as illuminated by molecular-dynamics 
simulations’). 

Next, we introduced a chemical potential gradient by using the KCl 
concentration gradient system>. The concentration gradient ratio is 
defined as Ccis/Cyrans, Where C,js is the KCl concentration in the cis 
chamber and Cyan; is that in the trans chamber; the concentration 


198 | NATURE | VOL 536 | 11 AUGUST 2016 


results to equation (1), we find the extracted surface charge values to be 
—0.024 Cm’, —0.053 Cm and —0.088 Cm? for a 2-nm, 6-nm and 

25-nm pore, respectively. c, Conductance as a function of pH for a KCl 

concentration of 10 mM, for a 2-nm, 6-nm and 25-nm pore. 


ranges from 1mM to 1M. The highly negatively charged surface 
selectively passes the ions (in this case potassium ions) according to 
their polarity, resulting in a net positive current. By measuring the J-V 
response of the pore in the concentration gradient system (Fig. 3a), 
we can measure the short-circuit (J,.) current corresponding to zero 
external bias, while the osmotic potential can be obtained from the 
open-circuit voltage (Vo-). The pure osmotic potential, Vos, and current, 
I,s, can then be obtained by subtracting the contribution from the 
electrode-solution interface at different concentrations; this contri- 
bution follows the Nernst equation*”? (Extended Data Fig. 1). The 
osmotic potential is proportional to the concentration gradient ratio 
(Fig. 3b) and shares a similar trend with the osmotic current (Fig. 3c). 
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Figure 3 | Osmotic power generation. a, Current-voltage characteristics 
for a 15-nm nanopore in a 1 M/1 mM KC] gradient. The contribution 
from the redox reaction on the electrodes is subtracted from the measured 
total current (grey line) (Extended Data Fig. 1), producing the red dashed 
line, which represents the pure osmotic contribution. I,, and V,, are the 
short-circuit current and open-circuit voltage, respectively; Io; and Vos 
are the osmotic current and potential. b, The generated osmotic potential 
as a function of the salt gradient. C.:, is set to be 1 M KC]; Cyrans is tunable 
from 1 mM to 1 M KCI. The solid line represents a linear fitting to 
equation (2). c, Osmotic current as a function of salt gradient. The solid 
line fits proportionally to the linear part of equation (2). d, Osmotic 
potential and current as a function of pore size. The dashed lines are a 
guide to the eye and show the trend as the pore size is changed. The error 
bars come from the corresponding error estimations and represent the 
s.e.m. (Methods). 
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The measured osmotic energy conversion is also pH dependent 
(Extended Data Fig. 2a, b). The increase in pH leads to higher gene- 
rated voltage and current, suggesting the importance of surface charge 
to the ion-selective process. 

The extracted osmotic potential is the diffusion potential and it arises 
from differences in the diffusive fluxes of positive and negative ions, 
because the pore is ion selective: cations diffuse more rapidly than 
anions (Fig. 1). The diffusion potential, Vaig, can be described as”: 
aKeI 


trans 
Ua <el 


Vaite = S(2)is saa 


(2) 


Here, S(37);, is the ion selectivity”? for the MoS, nanopore (and equals 
1 for the ideal cation-selective case, and 0 for the non-selective case); 
it is defined as $(’);, = t, —t_, where t, and t_ are the transference 
numbers for positively and negatively charged ions respectively. F, R 
and T are the Faraday constant, the universal gas constant, and the 
temperature, respectively, while afé, and ay2y° are the activities of 
potassium ions in cis and trans solutions. By fitting the experimental 
data presented in Fig. 3b to equation (2), we find the ion-selectivity 
coefficient $(3/);, to be 0.4, suggesting efficient cation selectivity. This 
is because the size of our nanopores lies in the range in which the 
electrical double-layer overlap can occur inside the pore’®, because the 
Debye length, Xx, is 10nm for 1mM KCL. As shown in Extended Data 
Fig. 3d, with a concentration gradient of 10 mM/1 mM ina 5-nm pore, 
the ion selectivity approaches nearly 1, presenting the conditions for 
ideal cation selectivity”’. 

To test the cation-selective behaviour of the pore further, we inves- 
tigated the relationship between power generation and pore size. As 
shown in Fig. 3d, small pores display better voltage behaviour, reflecting 
better performance in terms of ion selectivity. The ion selectivity, 
S(2’)is, decreases from 0.62 to 0.23 as the pore size increases. We cal- 
culated the distribution of surface potential for different pore sizes 
(2nm, 5nm and 25nm) in order to compare the selectivity difference 
(Extended Data Fig. 3a—c). It has been proven that the net diffusion 
current stems only from the charge separation and concentration dis- 
tribution within the electrical double layer”, and therefore the total 
current can be expected to increase more rapidly within small pores 
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Figure 4 | Demonstration of a self-powered 
nanosystem. a, Optical image of the fabricated 
MoS; transistor, with a designed gate, and drain 
and source electrodes. b, Circuit diagram for 
the self-powered nanosystem: the drain-source 
supply for the MoS, transistor is provided 

by a MoS; nanopore, while a second nanopore 

+ device operates as the gate voltage source. 

D, drain; G, gate; S, source; Rp, pore resistance; 
Vig, gate voltage; V,, nanopore output voltage. 
R, connected in series with Vig has been omitted. 
c, Powering all the terminals of the transistor 
with nanopore generators. The graph shows the 
modulated conductivity of the MoS, transistor 
as a function of the top gate voltage (Vig). Inset, 
current-voltage characteristics at various gate 

\ voltages (—0.78 V, 0 V and 0.78 V). 


Single-layer MoS» 
—_ nanopore 


Single-layer MoS, 
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in the double-layer overlap range compared with larger pore sizes 
(Fig. 3d). This slight decrease in current in larger pores might be attrib- 
uted to a reduced local concentration gradient, and also to probable 
overestimation of the redox potential subtraction. The current can be 
calculated using either a continuum-based Poisson—Nernst-Planck 
(PNP) model or molecular-dynamics simulations. The measured 
dependence of the osmotic potential and osmotic current as a function 
of the concentration ratio (Fig. 3b, c) is well captured by both compu- 
tational methods (molecular dynamics, Extended Data Fig. 4, and 
continuum analysis, Extended Data Fig. 5a). The non-monotonic 
response to pore size (Fig. 3d and Extended Data Fig. 2c, d) might not 
only be explained by a possible depletion of the local concentration 
gradient in large pores, but is also predicted by the continuum-based 
PNP model (Extended Data Fig. 5b) because of the decrease in ion 
selectivity. 

In order to gain further insight into the thickness scaling, we first 
verified the pore-conductance relation proposed in equation (1) by 
using molecular dynamics (Extended Data Fig. 6). We found that ion 
mobility also scales inversely with membrane thickness (Extended 
Data Fig. 7a, b), which may conform to previous observations”. 
We then performed molecular-dynamics simulations of multilayer 
membranes of MoS, to investigate the power generated by those 
membranes. We observe a strong decay in the generated power as 
the number of layers increases (Extended Data Fig. 7c, d), indicating 
that the best osmotic power generation occurs in two-dimensional 
membranes. The consistency between experiments and theoretical 
models highlights two important factors in achieving efficient power 
generation from a single-layer MoS, nanopore: atomic-scale pore 
thickness and surface charge. 

If we have a single-layer MoS, membrane with a homogeneous pore 
size of 10nm anda porosity of 30%, then, by exploiting parallelization, 
the estimated power density would reach 10° W m * with a KCl salt 
gradient. These values exceed—by two to three orders of magnitude— 
the results obtained with boron nitride nanotubes’, and are a million 
times higher than the power density obtained by reverse electrodialysis 
with classical exchange membranes! (Extended Data Table 1). 

As well as using KCl concentration gradients, the nanopore power 
generator concept could also be applied to liquid-liquid junction 
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systems with a chemical potential gradient, because the diffusion 
voltage originates from the Gibbs mixing energy of the two liquids 
(Supplementary Information). Thus, high-performance, nanopore- 
based generators based on a large number of available liquid combi- 
nations could be explored“. For example, we have shown substantial 
power generation based on a chemical potential gradient that uses two 
types of liquid (Extended Data Fig. 8d). Considerable energy could also 
be generated by exploiting parallelization, with multiple small pores or 
even a continuous porous structure within a large area of single-layer 
MoS, membrane”, which could be scaled up for mass production using 
the ECR pore-fabrication technique’ or plasma-based defect creation’. 

The use of individual nanopores as a micro/nano power source has 
long been expected. We find here that an individual osmotic generator 
can also serve as a nanopower source for a self-powered nanosystem, 
owing to its high efficiency and power density. For this self-powered 
nanosystem, we chose the high-performance single-layer MoS, tran- 
sistor (Fig. 4a) because of its excellent operation at low power’, We 
characterized this transistor in the configuration shown in Fig. 4b, 
using two nanopores to apply voltages to the transistor’s drain and gate 
terminals. As shown in Fig. 4c, by varying the top gate voltage in the 
relatively narrow window of £0.78 V, we could modulate the channel 
conductivity by a factor of 50 to 80. Furthermore, when we fixed the 
gate voltage and varied the drain-source voltage Va,, (Fig. 4c inset), we 
obtained a linear Iy,— Vas curve, demonstrating efficient injection of 
electrons into the transistor channel. Further calibration with a stand- 
ard power source can be found in Extended Data Fig. 8. This system 
is an ideal self-powered nanosystem in which all the devices are based 
on single-layer MoS. 

We have shown that MoS, nanopores are promising candidates for 
investigating osmotic power generation as a renewable energy source. 
The substantial power generated in our experiments can be attributed 
mainly to the atomic-scale thickness of the MoS, membrane. Our 
results also provide new avenues for studying other membrane-based 
processes, such as water desalination’ or proton transport’. 
Furthermore, the nanopore generator may see application in other 
ultralow-power devices, such as in electronics. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Nanopore fabrication. We fabricated MoS, nanopores either by using the atomic- 
scale ECR technique’ or by electron irradiation under TEM. Prior to nanopore 
fabrication, we create a freestanding MoS, membrane”. Briefly, we use KOH wet 
etching to prepare SiN, membranes (of size 101m x 10j1m to 50}1m x 50 jm; 
20nm thick). We then use focused ion beam (FIB) or ebeam lithography (followed 
by reactive ion etching) to drill an opening of 50-300 nm in the membrane. Next 
we suspend single-layer MoS; membranes, grown by chemical-vapour deposition, 
on the opening by transferring them from sapphire growth substrates*”. TEM 
irradiation can be applied to drill a single pore and image the pore. ECR is done 
by applying a step-like transmembrane potential to the membrane and monitor- 
ing the transmembrane ionic current with a Femto DLPCA-200 amplifier (Femto 
Messtechnik GmbH), with a custom-made feedback control on transmembrane 
conductance. Nanopores are formed when reaching the critical voltage of MoS, 
oxidation (>0.8 V). We then calibrate the pore size using J-V characteristics. 
Nanofluidic measurements. Nanofluidic transport experiments are performed as 
described'®. The nanopore chips are mounted in a custom-made polymethylmeth- 
acrylate chamber, and then wetted with an H2O:ethanol (1:1) solution. Nanofluidic 
measurements are carried out by taking the I-V characteristics of the nanopore in 
different KC] solutions (Sigma Aldrich; the ionic concentration or pH of the solu- 
tion varies), using an Axopatch 200B patch-clamp amplifier (Molecular Devices 
Inc.). A pair of chlorinated Ag/AgCl electrodes (which have been rechlorinated 
regularly) is used to apply voltage and to measure the current. In addition, the elec- 
trode potential differences in solutions of different concentrations are calibrated 
with a saturated Ag/AgCl reference electrode (Sigma Aldrich). 

To measure osmotic power generation, we filled the reservoirs with solutions of 
different concentrations, ranging from 1 mM to 1 M. Measurements are performed 
at various pH conditions. We found that power generation was optimal at pH 11. 
First, we measured the I-V response; we obtained the short-circuit current from 
the interception of the curve at zero voltage, and the open-circuit voltage from 
the interception of the curve at zero current. Next, to get the purely osmotically 
driven contribution, we subtracted the contribution made by the electrode poten- 
tial difference that results from the redox potential in different concentrations 
(Extended Data Fig. 1). 

For all experiments, we performed cross-checking measurements, including 
changing the direction of pH and concentration to make sure that the nanopores 
were not substantially enlarged during the experiments. Most MoS, pores were 
generally stable during hours of experiments owing to their high mechanical 
strength and stability within the +600 mV bias range. Thus, we strongly 
recommend the use of small supporting FIB/ebeam-drilled opening windows 
(of diameter 50-300 nm) for suspended membranes. 

Characterization of single-layer MoS, transistors. We fabricated single-layer 
MoS, transistors using a procedure similar to that in ref. 28. 

For electrical measurements we used an Agilent 5270B source-meter unit 
(SMU), an SR-570 low-noise current preamplifier and a Keithley 2000 digital 
multimeter (DMM; input impedance >10'° ©). All measurements were performed 
in ambient conditions in the dark. An improved efficiency of power conversion 
in nanopores is obtained by using a combination of pure room-temperature ionic 
liquids: 1-butyl-2-methylimidazolium hexafluorophosphate (Bmim PF.) and zinc 
chloride solution. 

We compare the performance of the single-layer MoS, transistor in two cases. 
First, we use two nanopores to apply Vig and Vas, while using a current ampli- 
fier and voltmeter to control the current and voltage drop across the device (see 
Extended Data Fig. 8a). In this case, we use voltage dividers to change the source 
and gate voltage on the device (not shown in Fig. 4a and Extended Data Fig. 8a). 
Second, we use the SMU to perform standard two-contact measurements. 

Although the characteristics of our transistor are similar in both set-ups, we 
comment here on the difference detected in the conductivity of the ON state. 
We attribute it to the slow response of the device in the first case. The change in 
transistor resistance that occurs when applying gate voltage leads to a change in 
the impedance of the device and thus a change in the applied effective voltage, 
Vaev (measured with a voltmeter connected in parallel). The nanopore reacts to 
the change in impedance with a certain stabilization time (from 10s to 100s). 
This appears to be a hysteretic effect and influences the conductivity versus 
gate-voltage measurements. In the second case, on the other hand, Vaey = Vas is 
constant. There are several secondary effects, which might in turn influence the 
measured values of two-probe conductivity. In relatively short channel devices, 
applied Vas might partially contribute to gating of the channel and furthermore 
to modification of contact resistance. This could be understood by comparing 
the values of Vg, (around 100 mV) and Vig (780 mV). We also do not exclude the 
possibility of slight doping variations and hysteretic effects that occur because 
of the filling of trap states inside the transistor channel. However, by driving a 
device to the ON state and stabilizing the current for a reasonable amount of 


LETTER 


time, we obtained a very good match in drain—-source Ig.— Vas characteristics 
(Extended Data Fig. 8c). We thus conclude that, although there are differences 
in performance in the two cases, these differences originate mainly from the slow 
response time of the nanopore. 

We extracted the resistance and power of the nanopore by using the ionic 
liquid Bmim PF.. By considering the simple resistor network (Extended Data 
Fig. 8d, inset), we could extract the output power as a function of the load resist- 
ance, Rioaa. We fit our dependence according to the following model, which 
assumes a constant Voy, and Rpore! 


VoutRioad 
(Rp Tr Rioaa)” 


and found a good fit with Vour= 0.83 V, which is close to the measured Vou of 0.78 V, 
and with a nanopore impedance, R,, of 9.4 2.1 MQ. (Extended Data Fig. 8d). 
Data analysis. All data analysis has been done using custom-made Matlab 
(R2016a) code. First, we recorded I-V characteristics with an Axopatch 200B 
amplifier, by using either an automatic or a manual voltage switch. We then 
segmented the current trace into pieces of constant voltage, V. We extracted the 
mean, i(V), and standard deviation, o(V), of the stable part of each segment and 
generated an J-V plot. The error bars are the standard deviations (see Fig. 3 and 
Extended Data Fig. 2). All I-V characteristics were linear. In order to propagate 
the error correctly, we used a linear fitting method?!. Using this method, we can 
extract the a, b, 0, and oy values of the first-order polynomial I(V) = b V+a. The 
conductance is the slope, b, of the I-V curve, and a describes the offset. The height 
of the error bars reported for conductance measurements is 205. 

We report the osmotic power generation using the osmotic current, I,,, and 
osmotic voltage, Vos. Starting from the linear-fit values of the J-V plot, we can 
calculate the measured current and voltage: Imeas=a@ and Vimeas = a/b. These meas- 
ured values have to be adjusted for the electrode potential: Vos =Vineas — Vredox 
and Ips=(Vos/ Vmeas) X Imeas- Assuming an uncertainty in our estimation of redox 
potential, 7yedox. of 5%, we can propagate the errors using the following formulas": 


i_/ a 2 
Vos =, fa +( fs] Oedox 


Io, = Jo? ae (Viedox7b)” + BO? sox 


Power = 


We used these relations to calculate the error bars shown in plots of osmotic voltage 
and current (Fig. 3 and Extended Data Fig. 2). 

Computational simulations. Molecular-dynamics simulations. These simulations 
were performed using the LAMMPS package**. A MoS, membrane was placed 
between two KC] solutions as shown in Extended Data Fig. 4a. A fixed graphene 
wall was placed at the end of each solution reservoir. A nanopore was drilled in 
MoS, by removing the desired atoms. The accessible pore diameter considered in 
all of the molecular-dynamics simulations is 2.2 nm with a surface charge density 
of —0.04694 Cm’. The system dimensions were 6nm x 6nm x 36nm in the x, y 
and z directions, respectively. We used the extended simple point charge (SPC/E) 
water model, and applied the SHAKE algorithm to maintain the rigidity of each 
water molecule. The Lennard Jones (LJ) parameters are tabulated in Supplementary 
Table 1. The LJ cut-off distance was 12 A. The long-range interactions were com- 
puted by the particle-particle particle-mesh (PPPM) method™. Periodic boundary 
conditions were applied in the x and y directions. The system is non-periodic in 
the z direction. For each simulation, first the energy of the system was minimized 
for 10,000 steps. Next, the system was equilibrated in the isothermic-isobaric 
(otherwise known as NPT) ensemble for 2 ns at a pressure of 1 atm and a tem- 
perature of 300 K to reach the equilibrium density of water. Graphene and MoS 
atoms were held fixed in space during the simulations. Then, canonical (NVT) 
simulations were performed, during which the temperature was maintained 
at 300 K by using the Nose-Hoover thermostat with a time constant of 0.1 ps 
(refs 35, 36). Trajectories of atoms were collected every picosecond to obtain the 
results. For accurate mobility calculations, however, the trajectories were stored 
every ten femtoseconds. 

Continuum model. We also used the continuum-based two-dimensional Poisson- 
Nernst-Planck (PNP) model. We neglected the contribution of Ht and OH” ions 
in this calculation, as their concentrations are much lower compared with the bulk 
concentration of the other ionic species (K* and Cl”). Hence, water-dissociation 
effects are not considered in the numerical model. Further, we assumed that the 
ions are immobile inside the steric layer and do not contribute to the ionic current. 
We also did not model the Faradaic reactions occurring near the electrode. Finally, 
we assumed that the convective component of current originating from the fluid 
flow is negligible and does not contribute to the non-monotonic osmotic current 
observed in our experiments. We validated this assumption by performing detailed 
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all-atom molecular-dynamics simulations and predicted the contribution of 
electroosmotic velocity in comparison with the drift velocity of the ions. 

Under these assumptions, the total flux of each ionic species (I?) is contributed 
by a diffusive component resulting from the concentration gradient, and an 
electrophoretic component arising from the potential gradient, as given by: 


GG = —D;V ci —DziFoV ob 


where F is Faraday’s constant, z; is the valence of the ith species, D; is the diffusion 
coefficient, (2; is the ionic mobility, c; is the concentration and ¢ is the electrical 
potential. Note that the ionic mobility is related to the diffusion coefficient 
by Einstein's relation*’, 2; = 2, where R is the ideal gas constant and T is the 


thermodynamic temperature. The mass transport of each ionic species is: 


des _ 


-V-E 
dt 


The individual ionic current (Jj) across the reservoir and the pore is calculated by 
integrating their respective fluxes over the cross-sectional area, that is: 


I; = | aFTds 


The total ionic current at any axial location is calculated as I= $0" , z;FIjdS, where 
S is the cross-sectional area corresponding to the axial location and m is the 
number of ionic species. In order to determine the electric potential along the 
system, we solve the Poisson equation: 


V-(eV¢)=—-% 


€0 
where € is the permittivity of free space, ¢, is the relative permittivity of the 
medium and g, is the net space charge density of the ions, defined as: 


m 
Po = ae ZiCj 


We provide the necessary boundary conditions for the closure of the problem. The 
normal flux of each ion is assumed to be zero on all the walls so that there is no 
leakage of current. To conserve charge on the walls of the pore, the electrostatic 
boundary condition is given by: 


n-Vo=— 


€0€r 


where n denotes the unit normal vector (pointing outwards) to the wall surface 
and @ is the surface charge density of the walls. The bulk concentration of the cis 
reservoir is maintained at C,,,x and the bulk concentration on the trans reservoir 
is maintained at Cin. As we are interested in understanding the osmotic short- 
circuit current, I,., we do not apply any voltage difference across the reservoirs. 
Thus, the boundary conditions at the ends of the cis and trans reservoirs are 
specified as: 


¢; =Cmax» ¢ =0 


ci =Cmin. 6 =0 


The coupled PNP equations are numerically solved using the finite volume 
method in OpenFOAM (http://www.openfoam.com/). The details of solver 
implementation are discussed in refs 38-40. The simulated domain consisted 
of a MoS nanopore of length, Ly, 0.6nm and diameter, dy, varying from 
2.2nm to 25 nm. The simulated length of the reservoir was Leis = Ltrans = 11 nm; 
the diameter of the reservoir was 50 nm. KCl buffer solution was used in the 
simulation. The bulk concentration of the cis reservoir was fixed at 1 M and the 
concentration in the trans reservoir was varied systematically varied from 1 mM 
to 1M. The simulation temperature was 300 K. The bulk diffusivities of K* and 
Cl” were 1.96 x 10-?m*s~! and 2.03 x 10-°m’s_!. The dielectric constant of 
the aqueous solution was assumed to be 80. We also assumed zero surface charge 
density on the walls of the reservoir, as the reservoir is too far away from the 
nanopore to have an influence on the transport. Unless otherwise stated, the 
charge on the walls of the MoS, nanopore is assumed to be 0, = —0.04694 Cm’, 
consistent with the surface charge calculated from our molecular-dynamics 
simulations. 
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Extended Data Figure 1 | Subtraction of the contribution made redox potential difference. b, Electrode contribution as a function of the 
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showing the contributions of different parts of the system to the overall measured electrode redox potential differences at the reference electrode. 
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measured voltage or current. Vineasured is the measured voltage; Eredox is the 14-nm pore in 1 M KCl/1 mM KCL. Inset, the design of the fluidic cell. 
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fluctuates to negative, indicating that the pore charge is relatively low. 
One possible explanation for the negative voltage point is that the surface 
charge on the pore has fluctuated to positive. c, d, Osmotic potential (c) 
and osmotic current (d) generated using two different pores (3-nm and 
15-nm) at pH 11 in different concentration gradients. 
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concentrations as a function of the radial distance from the centre of 
the pore, for different concentration ratios. d, Short-circuit current as 
a function of the concentration ratio. e, Open-circuit electric field as a 
function of the concentration ratio. 
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Extended Data Figure 5 | Continuum-based PNP modelling of power generation. a, Short-circuit current, I,,, as a function of the concentration 
gradient ratio. The diameter of the nanopore here is 2.2 nm. b, I,, as a function of the nanopore diameter. The salinity concentration ratio is fixed at 1,000. 


The surface charge of the nanopore, oy, is — 0.04694 C m’. 
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KCI solution. The inset illustrates simulated multilayer membranes. 
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standard two-probe measurements made with an external source. c, ]-V 
characteristics at Vig = 0.78 V after current stabilization, measured in 
both set-ups. d, Output power of nanopore in Bmim PF,/zinc chloride 
as a function of load resistance, Rioad. Inset, circuit diagram for these 
measurements. 
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Extended Data Table 1 | Power generation according to membrane thickness 


Reverse electrodialysis cells Power Membrane thickness 


density (W/m?) 


This work 10° 0.65 nm 


The table shows the power generated by membranes of different thickness; data from refs 5, 22, 41-45. 
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Abrupt plate accelerations shape rifted 


continental margins 


Sascha Brune!?, Simon E. Williams, Nathaniel P Butterworth? & R. Dietmar Miiller? 


Rifted margins are formed by persistent stretching of continental 
lithosphere until breakup is achieved. It is well known that strain- 
rate-dependent processes control rift evolution’”, yet quantified 
extension histories of Earth’s major passive margins have become 
available only recently. Here we investigate rift kinematics globally 
by applying a new geotectonic analysis technique to revised global 
plate reconstructions. We find that rifted margins feature an initial, 
slow rift phase (less than ten millimetres per year, full rate) and 
that an abrupt increase of plate divergence introduces a fast rift 
phase. Plate acceleration takes place before continental rupture and 
considerable margin area is created during each phase. We reproduce 
the rapid transition from slow to fast extension using analytical and 
numerical modelling with constant force boundary conditions. The 
extension models suggest that the two-phase velocity behaviour is 
caused by a rift-intrinsic strength-velocity feedback, which can 
be robustly inferred for diverse lithosphere configurations and 
theologies. Our results explain differences between proximal and 
distal margin areas’ and demonstrate that abrupt plate acceleration 
during continental rifting is controlled by the nonlinear decay of the 
resistive rift strength force. This mechanism provides an explanation 
for several previously unexplained rapid absolute plate motion 
changes, offering new insights into the balance of plate driving 
forces through time. 

Rifted continental margins, with an overall length of more than 
100,000 km, are the longest tectonic features on our planet, two times 
longer than spreading ridges or convergent plate boundaries. During 
formation of rifted margins, new continental surface area is generated 
by normal faulting and volcanic intrusions. Both processes are depend- 
ent on extension velocity, which governs the thermal configuration of 
the rift and hence the depth of the brittle-ductile transition and the 
length of normal faults’, as well as the degree of decompression melting 
and serpentinization?. Moreover, rift velocity has been shown to control 
rift symmetry and the formation of hyper-extended crust”. 

Quantifying the history of extension velocity at rifted margins 
requires knowledge of the motions between diverging plates and of 
the timing of continental breakup. Recently, revised regional syn-rift 
plate models for the opening of the Atlantic, South China Sea, Gulf 
of California and Australia— Antarctica rifting have become available 
(see Supplementary Table 1). Here, we incorporate these regional stud- 
ies in a global plate kinematic model® and use an updated, simplified 
global set of boundaries between continental and oceanic crust (COBs) 
to explore continental breakup processes. We exploit the fact that in 
a pre-rift reconstruction, present-day COBs from conjugate passive 
margins will show substantial overlap, since the plate tectonic models 
do not explicitly incorporate lithospheric deformation. As the plates 
move apart the overlap decreases, and the time when conjugate COBs 
disconnect defines breakup, the transition from rifting to sea-floor 
spreading (Fig. 1). 

We extracted the local rift velocity via pyGPlates, a novel Python 
library that allows script-based access to the plate reconstruction soft- 
ware GPlates. We subdivided each COB in segments of ~50 km length, 


and computed the relative velocity between the two contributing plates 
at each segment (Fig. 1c). First we address the question of whether there 
is a systematic trend in the temporal evolution of extension velocity 
within entire rift systems. We visualize the velocity evolution of the rift 
system in a single diagram by displaying the integrated rift axis length 
of all segments that deform within a certain velocity range in 1 million 
year (Myr) time intervals (Fig. 1d). We discarded any segment where 
breakup is accomplished, that is, where COBs do not overlap anymore. 
Hence, the analysed plate boundary length declines through time 
(see dashed grey line in Fig. le) and reduces to zero at final continental 
separation. We explicitly excluded failed rifts from our analysis, because 
they do not contribute to passive margin formation. 

In addition to rift velocity we computed the rate of rifted margin for- 
mation, that is, the product of the extension velocity and the velocity- 
orthogonal length of each COB segment. Integration along both 
conjugate margins and division by 2 yields the overall rate of rifted 
margin formation F. The formation rate F increases with extension 
velocity and decreases when individual segments are discarded dur- 
ing diachronous breakup (Fig. le). Note that F is independent of the 
distance between hinge line and COB, instead representing the newly 
created margin surface. This allows us to draw robust conclusions on 
rifted margin growth as no assumptions about previous rift phases, or 
initial crustal and lithospheric thickness have to be made. 

With a rift length of more than 10,000 km, the South Atlantic Rift 
(Fig. 1) generated 2.1 x 10°km* of rifted margin area, more than any 
other Phanerozoic continental rift. During the first 25 Myr of extension, 
the Euler pole is located close to the equatorial Atlantic Rift”; therefore 
rifting is faster in the southern South Atlantic. The mean rift velocity 
remains relatively low (<10mm yr“, full rate) until it increases rapidly 
to more than 35mm yr! within 6 Myr. This speed-up at ~126-120 
Myr ago coincides with severe loss of strength in the equatorial Atlantic 
Rift?, suggesting rift weakening as a controlling parameter. Both rift 
phases shape the rifted margins: about one-third of the South Atlantic 
margin area was formed during the slow, and two-thirds during the 
fast phase (Fig. le). 

This two-phase velocity history is a common feature of many other 
rift systems, illustrated for the central North Atlantic, North America- 
Iberia, Australia—Antarctica and South China Sea rifting in Fig. 2 and 
for the Gulf of California, the northeast Atlantic and North America-— 
Greenland in Extended Data Figs 3 and 4. Consistently, the fast rift 
phase starts ~10 Myr before inception of breakup and persists until 
plate separation is complete. Both the slow and the fast phase contribute 
markedly to shaping the rifted margins. All regional tectonic recon- 
structions used here, compiled from a range of independent studies, 
result in the same two-phase pattern, underlining the robustness of 
our results. Moreover, using alternative reconstructions for the South 
Atlantic does not change our conclusions (Extended Data Fig. 8). 

We conducted robustness tests for all case studies using alternative 
COBs defined by the extreme landward limit of basement that is not 
clearly continental crust. These COB polygons are located closer to 
the coastline, which shortens the duration of rifting by several million 
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Figure 1 | Rift velocity evolution of the South Atlantic. a, b, Rift 
velocities (coloured circles) are evaluated at overlapping plate polygons 
(black) with rift-ward polygon boundaries being defined through present- 
day COBs. Central SA, Central South Atlantic; Equ. SA, equatorial South 
Atlantic; NW Africa, Northwest Africa; S Africa, South Africa; S America, 
South America; South. SA, southern South Atlantic. c, Extension velocity 
evolution of all rift points showing the syn-rift and post-rift phase (thick 
and thin lines, respectively). d, Frequency of syn-rift velocities in terms of 
rift axis length. Colours display the integrated length of all rift segments 


years. Nevertheless, the contributions of both phases in shaping the 
rifted margins remain substantial (Extended Data Figs 5 and 6). 

To evaluate the geodynamic response of a rift zone to plate driving 
forces, we used two-dimensional analytical and numerical models. 
While the most common modelling approach is to prescribe a constant 
velocity at the model boundary”, here we use a force boundary condi- 
tion"! allowing for self-consistent computation of velocity evolution. 
The force boundary condition is applicable to major rifts where the 
integrated strength of the entire rift system is comparable to the plate 
driving forces, such as those considered here. 

First, we developed an analytical model of a homogeneous litho- 
spheric layer that deforms according to power law rheology under 
a constant force. For simplicity, we neglect depth-dependent thin- 
ning, non-constant temperature and compressibility; however, these 
processes are incorporated within the numerical models described 
later. The resulting velocity is v=L/(nt,) x (1—t/t,)~!, where L is the 
width of the necking zone, n the dislocation creep stress-exponent, 
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deforming at the same velocity. The black line shows mean velocity of 
all syn-rift segments. The fast rift phase starts ~126 Myr ago. e, The 
margin formation rate F at each rift segment of length L is computed by 
multiplying rift velocity v and velocity-orthogonal length of the segment 
L,. Total rift length is shown as grey dashed line. Timing of Parana- 
Etendeka flood basalts is depicted by volcano symbol. Black horizontal 
bars indicate diachronous breakup. For an animation of the kinematic 
evolution, see Supplementary Video 1. 


t, the duration of rifting and f is time (see Methods for derivation). 
The formula is plotted in Fig. 3c using typical parameter values of 
L=100km and t,=20 Myr. Realistic values’ for n range between 3 and 
4, hence we approximate a purely viscous lithosphere by n = 3.5 (model 
A1). Introducing brittle failure as a highly nonlinear end-member of 
power law creep’, we also show solutions for n= 10 (model A2) and 
n=30 (model A3). In all cases, rifting commences slowly, yet veloc- 
ities increase within a few million years, where the short duration of 
speed-up is caused by the power law rheology of the lithosphere. 
Numerical modelling accounts for more realistic evolution of defor- 
mation, temperature and buoyancy, but the resulting velocity evolu- 
tion is very similar to the analytical solution (Fig. 3c). The reason 
for the acceleration is a feedback mechanism that we term dynamic 
rift weakening: as long as the rift is strong, the extension rate is low 
(Fig. 3a), but with continued deformation the rift centre becomes 
successively weaker due to necking and strain softening (Fig. 3b). 
Loss of strength accelerates rifting, which results in new strength loss 
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Figure 2 | Other rift systems. a—d, All depicted rifts exhibit a two-phase 
velocity history. A rapid plate acceleration precedes inception of breakup 
by ~10 Myr. Timing of Central Atlantic Magmatic Province is depicted 
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by volcano symbol in a. Corresponding map views are shown in Extended 
Data Figs 1-3. For an animation of each rift kinematic evolution, see 
Supplementary Videos 2-5. 
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Figure 3 | Analytical and numerical model with force boundary 
conditions. a, b, Strength evolution of two-dimensional numerical rift 
model. Arrows at lateral model sides indicate extension velocity. c, Velocity 
evolution of analytical (Analyt.) models (A1-A3) and numerical (Num.) 
experiments (M1-M4). Analytical solution Al employs a stress-exponent 
of n = 3.5, assuming an entirely viscous lithosphere, while A2 and A3 
also account for brittle crustal deformation by using n = 10 and n= 30, 
respectively. Numerical reference model M1 is described in Methods and 
Extended Data Table 1. Other models are identical to M1, but apply a fully 


and causes conjugate rift sides to accelerate rapidly'*!°. We show ten 
models with varying rheological flow laws, thermal configurations, 
layer thicknesses, frictional softening and thermal expansivity (Fig. 3), 
which reproduce a large variety of rifted margin configurations 
(Extended Data Fig. 7). The two-phase behaviour and the abruptness of 
speed-up are robustly represented by any of these models. The numer- 
ical experiments also demonstrate that a variation of the extensional 
tectonic forces applied at the model boundaries affects the duration of 
the first, slow rift phase. Increasing the boundary force leads to earlier 
rift acceleration and breakup, while reducing the force prolongs the 
slow rift phase, or even generates a failed rift where conductive cooling 
and thermal strengthening decrease the extension rate until the rift 
becomes abandoned (Extended Data Fig. 7). We conclude that our plate 
kinematic and theoretical analyses independently suggest two-phase 
velocity behaviour as a key feature of successful rifts, which should 
have affected any rifted continental margin. Numerical modelling with 
realistic material properties as conducted here brackets the duration of 
rift-induced plate speed-up to between 2 and 10 million years. 

Our results have profound implications for the interpretation of pas- 
sive margin structures. A variety of studies illustrate striking differences 
between proximal and distal rifted margin domains*"®. This dichotomy 
can be attributed to the suggested two-phase velocities during basin- 
ward localization: the slow phase shapes the proximal margin, while 
the fast phase dominates the distal margin where our analysis predicts 
larger fault-slip rates, faster subsidence, higher heat flow, enhanced 
partial melting and associated underplating or volcanism. 

This study provides an alternative explanation for the often enig- 
matic lack of extension interpreted from the tectono-stratigraphic 
record along many rifted margins’. Both reconstructions and 
modelling suggest that most of the new area of crust formed during 
rifting is created in a comparably short period of time towards the 
end of the syn-rift phase, when strain is likely to have localized to 
the distal part of the margins. This may explain why extension 
estimates from syn-rift faulting, typically biased towards the 
proximal margin areas, and interpreted using concepts borrowed from 
failed rifts where the feedback process we describe never occurred, 
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felsic crust (M2), a comparably thick crust (M3), or a thick lithosphere 
(M4), respectively. The duration of plate speed-up is a few million years for 
any of these configurations. The generated margin structures of M1-M4 
are displayed along with six alternative models in Extended Data Fig. 7. 

d, The margin area of M1 consists of two major parts, formed during the 
slow and the fast phase, respectively. A constant rift length of 1,000 km 

is assumed. e, Rheological setup of models M1-M4. LAB, lithosphere- 
asthenosphere boundary; Moho, crust-mantle boundary. 


commonly underestimate the total extension indicated by whole- 
crustal thinning. 

Our results are applicable to any rifted margin, whether volcanic or 
magma-poor. Yet, the evolution of specific rifts will be modulated by 
other factors affecting the force budget and the rift strength such as 
lithospheric heterogeneity, rift obliquity'’, active rifting due to asthe- 
nospheric upwelling'® and diking, as well as plume arrival’. Plumes 
contribute to breakup by reducing the strength of the rift’’, however, 
there can be a considerable delay between plume arrival and abrupt 
plate acceleration”. While any combination of these processes may 
influence how lithospheric strength evolves within individual rift 
systems, all successful rifts are expected to experience the proposed 
strength-velocity feedback before breakup. 

Owing to the high viscosity of the lower mantle, only lithospheric 
and upper mantle processes can affect plate movements at timescales 
of a few million years”’. It has been proposed that abrupt plate acceler- 
ations can be caused by plume-lithosphere interaction”, subduction 
initiation’, and slab detachment” possibly induced by ridge 
subduction”®. However, none of these mechanisms explain our result 
that plate speed-up systematically precedes continental breakup. 
While the present motions of Earth's plates are governed by slab pull, 
basal drag and ridge push, we propose that abrupt plate acceleration 
during continental rifting is controlled by the rapid decrease of rift 
strength. 

Dynamic rift weakening presents a new explanation for several 
previously unexplained rapid absolute plate motion changes, often 
appearing as cusps or kinks in apparent polar wander (APW) paths. 
A recent review”® found four cusps in global APW paths during the 
last 200 Myr and each of them can be associated with rift velocity 
speed-up during major continental rifting events: (1) 190 Myr ago— 
central North Atlantic opening; (2) 150 Myr ago—separation of East 
and West Gondwana; (3) 125 Myr ago—South Atlantic Rift and the 
split between India and Antarctica; and (4) 50 Myr ago—northeast 
Atlantic opening. A global-scale plate reorganization at ~100 Myr ago 
(ref. 27) corresponds to an increase in rift velocity between Australia 
and Antarctica, and the end of a standstill in the APW path for South 
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America as the final continental connection with Africa is broken®. 
We suggest that absolute plate motion changes are strongly related to 
continental breakup, allowing a linkage between palaeomagnetic data 
and geological evidence in reconstructing the dynamics of previous 
supercontinents such as Pangea, Rodinia and Nuna. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Rift kinematics. Quantitative restoration of continents to their pre-rift configu- 
ration during Pangea breakup involves estimating the amount of syn-rift exten- 
sion from present-day crustal thickness, and accounting for uncertainties in these 
estimates”*”’, The time of onset of rifting between two plates is constrained by 
geological evidence such as the ages of oldest syn-rift sediments and rift-associated 
volcanism. The direction and rate of divergence during rifting can be reconstructed 
by careful consideration of a diverse range of geological indicators such as seis- 
mic tectono-stratigraphy, dating of exhumed and volcanic rocks dredged/drilled 
within continent-ocean transitions, and fitting constraints from sections of the 
plate boundaries beyond the rift zone”*°"!. Note that in Mesozoic/Cenozoic tec- 
tonic reconstructions, plate rotations have to be discretized, whereas typical stage 
lengths are 5-10 Myr or even longer as observations usually do not permit building 
plate kinematic models with smaller stages. Our analysis combines independently 
conducted reconstructions (Supplementary Table 1) that account for recent 
geological and geophysical data sets. 

COBs. The definition of COBs contains considerable uncertainties for many 
margins—indeed, the definition of a COB as a sharp boundary is conceptually 
problematic, with interpretations of geophysical data highlighting the complex 
crustal architecture within the transition from continental to oceanic domains”. 
Regions of transition can be several tens of kilometres wide, with complexities 
that vary between margins closer to volcanic or non-volcanic end-member 
scenarios. Our starting point for defining COBs were the geometries defined by 
Seton et al.°. We modified these using a synthesis of COB interpretations com- 
piled from published crustal-scale geophysical data sets (see Supplementary 
Table 2). These data are primarily derived from seismic refraction experiments, 
but interpretations of crustal structure based on seismic reflection and gravity 
modelling are also included for regions where refraction data are sparse. Additional 
seismic constraints come from the data set of Winterbourne et al.**, who iden- 
tified unequivocal oceanic crust adjacent to continental margins along seismic 
profiles including some industry data, which are otherwise unavailable. The 
synthesis of margin-perpendicular profiles gives us a series of tie points along 
each margin, with which our COBs must be broadly consistent. To define COB 
polylines, we must interpolate between these tie points, which we did guided 
by first-order trends in maps of gravity derivatives and magnetic anomalies”’. 
However, for the specific purposes of this study, an important consideration is 
that we use the COBs to define the orientation of the rift. For this reason, we 
have used deliberately simplified COB geometries with orientations that represent 
the first-order trend of each rifted margin. Using these constraints, we generated 
alternative COB geometries to define a range of possible COB locations. Our 
preferred COB set lies relatively ocean-ward, so that it includes areas where the 
basement is interpreted to comprise exhumed mantle or seaward dipping reflectors, 
but not basement formed by sea-floor spreading processes. To test how sensitive 
our results are to our COB interpretation, we generated a second set of COB 
geometries, defining the extreme landward limit of basement that is not clearly 
continental crust (see Extended Data Figs 5 and 6). We stress that the COBs used in 
this study define an envelope of possible COB locations suitable for our sensitivity 
tests, and are not a natural replacement for ‘best-fitting’ COB locations interpreted 
in other studies. 

Observational evidence for timing of rift initiation. Our results, and particularly 
the occurrence of a two-phase, ‘slow-fast’ pattern in reconstructions of continental 
rifting, is sensitive to the age assigned to the onset of rifting. The condition for the 
slow-fast trend to disappear would be if the rift onset ages in our reconstruction 
model are erroneously old, such that rifting began later and proceeded (from the 
same full-fit configuration) at a faster rate. Hence, to establish the robustness of 
the slow-fast trend, we summarize geological evidence for the minimum age at 
which rifting began in each of the rift systems illustrated in our study, as well as 
observational evidence for accelerations in rift velocity that lend weight to our 
kinematic reconstructions. 

South Atlantic Rift. We model South Atlantic rifting using the reconstruction of 
Heine et al.’, with onset of slow rifting at ~150 Myr ago followed by acceleration 
at ~125 Myr ago. An extensive study linking biostratigraphy, lithostratigraphy 
and timing of deformation within basins along both African and South Atlantic 
conjugate margins north of the Walvis Fracture Zone* indicates that rifting was 
established by Berriasian times (>140 Myr ago). South of the Walvis Fracture Zone, 
the main phase of rifting between South America and southern Africa probably 
began earlier, in the Late Jurassic, following widespread, isolated Triassic—Jurassic 
rift basin development within southern South America®’. The timing of slow rift 
onset and subsequent acceleration are consistent with earlier reconstructions”. 
A reconstruction invoking later, post-Aptian age of breakup between salt basins 
on the Brazilian and Angolan conjugate margins*” has been proposed, but detailed 
seismic imaging and drilling of syn-rift sediments within the rifted margins, com- 
bined with basin subsidence histories*® argue against this later breakup scenario. 
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Further supporting our model, recent interpretation of magnetic anomalies in 
the southern South Atlantic within crust formed during the Cretaceous Normal 
Superchron constrains changes in the rate and direction of plate motions during 
breakup®, showing rapid acceleration of plate divergence during initial breakup 
in the southernmost South Atlantic, contemporaneous with the final stages of 
rifting further north. 

Central North Atlantic Rift. Our base reconstruction follows the work of Kneller 
et al.”*, who assigned a rifting onset age of 240 Myr ago, and predicts a speed-up at 
~200 Myr ago. The speed-up occurs around the time of Central Atlantic Magmatic 
Province (CAMP) volcanism, which represents an important time marker within 
the rift evolution. Stratigraphic evidence from syn-rift sediments within basins 
along the eastern margins of North America®“ and the conjugate Northwest 
African margins“! shows that continental rifting was active for at least 25 Myr 
before CAMP magmatism. Rapid speed-up of rifting is evidenced by a rate of 
sediment accumulation within these basins that increases drastically within 1-5 Myr 
before 200 Myr ago’. 

North America-Iberia. Our reconstruction proceeds from onset of rifting at 200 
Myr ago with a speed-up at ~145 Myr ago. Recent, alternative reconstructions” 
show a similar increase in extension velocities at the end of the Jurassic, but with 
a slightly earlier initiation of rifting (~203 Myr ago). Evidence for widespread 
rifting is recorded in basins along the Newfoundland and Iberian margins begin- 
ning in the late Triassic, but resulting in little crustal thinning; a second phase 
beginning in the late Jurassic led to marked thinning and breakup in the Early 
Cretaceous***. Sequential restoration®” yields post-145 Myr ago extension veloc- 
ities of 1-2mm yr“|, consistent with our reconstructions. The rift velocity before 
~145 Myr ago depends on the assumed age of rift onset. Taking the latest possible 
onset of rifting, Oxfordian (~161 Myr ago), modelled extension rates remain fairly 
constant throughout rifting. However, any earlier onset of rifting (for example, 
Triassic-Early Jurassic) as indicated by evidence listed above, would result in slower 
initial rifting followed by a Late Jurassic acceleration, consistent with our recon- 
struction model. The slow-fast velocity evolution is further supported by a study 
that uses previously unpublished seismic and borehole data to show continuous 
rifting in the basins along western Iberia beginning in the Triassic (+210 Myr 
ago), with three rift cycles, ending at 144 Myr ago. These authors find that the 
subsidence is relatively slow in the first two rift phases, then increases rapidly in 
rate during a rift climax in Late Oxfordian-Kimmeridgian times (~160-152 Myr 
ago) coinciding with the timing of our speed-up. 

Australia—Antarctica. Our reconstruction places the onset of rifting between 
Australia and Antarctica in the late Jurassic (~165-155 Myr ago), with an increase 
in rift velocity around 100 Myr ago. The earliest rift-fill comprises Callovian—Early 
Berriasian sediments (>160 to ~140 Myr ago), constrained by palynological dating 
of samples from the Polda, Bremer and Eyre basin“. Detailed tectono-stratigraphic 
analysis‘ and sequential structural restoration of interpreted seismic sections*® 
point towards slow rifting during the Jurassic-Early Cretaceous, followed by a 
rapid acceleration in crustal thinning and subsidence at the beginning of the 
Late Cretaceous (beginning around 102 or 93.5 Myr ago)***”. Shortly thereafter, 
breakup begins in the westernmost part of the rift system (dated by exhumation 
fabrics and volcanics), propagating eastwards over tens of millions of years**. 
South China Sea. Reconstructions of the South China margins indicate rapid 
extension and breakup beginning in the late Eocene”. Earlier extension devel- 
oped from ~60 Myr ago within a former Andean-style margin, while onset of 
slow extension before the late Eocene speed-up is recorded by minor volcanism 
in rift-related basins”’. This is further supported by subsidence and strain-rate 
analyses of wells and stratigraphic sections for basins within the northern margin 
of the South China Sea*!. 

Gulf of California. The speed-up in our reconstruction occurs around 12 Myr ago, 
corresponding to a phase of abruptly increased rift velocity and obliquity inferred 
from widespread structural markers*”. Phases of continental extension before the 
mid-Miocene are recorded by tectono-stratigraphic relationships and dating of 
rift-related volcanics and plutons****. These data indicate more diffuse exten- 
sion at significantly lower rates: Ferrari et al.°> estimate that the relative motion 
between the conjugate margins of the Gulf of California proceeded at an average 
of 7.7mm yr! from ~30-18 Myr ago, and 8.3mm yr! from ~18-12 Myr ago, 
consistent with our computed values. 

North America-Greenland. Initiation of rifting by ~140 Myr ago is substantiated 
by dating of rift-related volcanics and biostratigraphy>°. Starting from ~120 Myr 
ago, rift basin sedimentation is evident throughout the Cretaceous”. Rifting and 
subsidence rates were slow until a rapid increase around 70-80 Myr ago”, 
around the time of an increase in rift velocity and subsequent initiation of sea- 
floor spreading. 

Northeast Atlantic. Our reconstruction of relative motion between Greenland 
and Eurasia before the oldest sea-floor spreading magnetic anomalies in the north- 
east Atlantic (C24, ~53 Myr ago**) incorporates plate circuit computations using 
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constraints from the rift systems between Greenland, North America and Eurasia. 
The reconstruction shows slow extension in the Jurassic to Early Cretaceous 
followed by tectonic quiescence or modest mid-Cretaceous extension®™™, and a 
pronounced speed-up in the Late Cretaceous. The acceleration in Late Cretaceous 
rifting indicated by the reconstructions is more tightly constrained from spreading 
histories in the adjacent basins. The most important phase of rifting that ultimately 
led to breakup is constrained to the Latest Cretaceous—Early Paleocene. Skogseid®! 
proposed that enhanced syn-rift deposition took place between 75 and 62 Myr ago 
based on tectono-stratigraphy and subsidence analysis from seismic and well data 
from the Voring margin. 

End-member models of the South Atlantic Rift. We performed our analysis on 
several end-member models for the South Atlantic Rift (Extended Data Fig. 8). 
These models differ in terms of the timing of final South Atlantic breakup, which is 
difficult to constrain due to the Cretaceous superchron. However, models that fea- 
ture a late breakup**” also show a late speed-up at 110 Myr ago. Models with an 
earlier breakup, however’ depict an earlier speed-up at 120 Myr ago. In all cases, 
initially slow rift velocities are followed by large plate accelerations that precede 
the final breakup by 10-15 Myr. 

Another major difference between these models is the plate-internal defor- 
mation within South America: the Heine et al. model’ uses a largely intact South 
American plate where deformation occurs only along the border to Patagonia, 
while Moulin et al. separate South America in eight individual plates. We apply 
our analysis under two premises: (1) assuming a three-plate scenario (West 
Africa, South Africa, South America) shown in the upper panel of rift velocity 
diagrams (Extended Data Fig. 8b), and (2) we account for South America-internal 
deformation in a four-plate analysis where the northern and the southern part of 
South America are evaluated independently within the two bottommost panels in 
Extended Data Fig. 8b. 

We find that plate models, which feature large intra-plate deformation 

display two distinct speed-up events: first the Southern plates accelerate several 
million years before breakup in the southern South Atlantic and second the northern 
plates accelerate before breakup in the equatorial Atlantic. Plate models with less 
internal deformation, such as the Heine et al. model, display only a minor acceler- 
ation before southern South Atlantic opening whereas the largest speed-up occurs 
at 120 Myr ago before equatorial Atlantic breakup. While the relative importance of 
each speed-up depends on the amount of South America-internal deformation, all 
models illustrate plate acceleration before breakup of the controlling rift segment. 
Database. The entire rift velocity database is accessible via an open-access virtual- 
globe web interface through http://portal.gplates.org/cesium/?view=rift_v. This 
database contains the rift velocity history of any point at a major post-Pangea rifted 
margin. The velocity history can be visualized online and downloaded, lending 
itself as a source of tectonic boundary conditions for basin analysis software and 
geodynamic forward models. 
Numerical model setup. We apply the finite element code SLIM3D® to solve the 
coupled system of conservation equations for momentum, thermal energy and con- 
stitutive equations. The reference model M1 consists of four distinct petrological 
layers: 25km of felsic upper crust®, 10 km of mafic lower crust®, and a lithospheric 
mantle dominated by dry olivine rheology® that extends to a depth of 120km. The 
weak asthenospheric material below 120 km depth is represented through wet (that 
is, 1,000 p.p.m. H/Si) olivine rheology®. The entire model comprises a rectangular 
domain of 150 km depth and 500 km width, with 2km resolution. 

We apply dynamic boundary conditions at the lateral model sides, such that 
during rifting, the boundary force is kept constant, allowing for self-consistent 
evolution of extensional velocities. This approach is feasible if the model domain 
represents a large region whose strength is a major component in the overall force 
balance of the involved plates. The constant boundary force is maintained in our 
model until extensional velocities reach typical sea-floor spreading rates. Hereafter, 
the low rift strength becomes neglectable in the force balance, and we use velocity 
boundary conditions with a rate of 40mm yr~!. At the top boundary we use a free 
surface while at the bottom side isostatic equilibrium is realized by means of the 
Winkler foundation, where in- and outflow of material is accounted for during 
re-meshing. Deformation is accommodated by elasto-visco-plastic rheology so 
that the model self-consistently reproduces diverse lithospheric-scale deformation 
processes such as faulting, flexure and lower crustal flow. Viscous flow occurs 
via two creep mechanisms: diffusion and dislocation creep. The Mohr-Coulomb 
failure model is implemented for brittle deformation. 

The thermal state at the model start is a steady-state temperature distribution 
resulting from each layer’s heat conductivity, radiogenic heat production and the 
following boundary conditions: (1) lateral boundaries are thermally isolated, 
(2) the temperature at the surface is 0°C, (3) below the lithosphere asthenosphere 
boundary the temperature is set to 1,350°C. To avoid rift localization at the model 
boundaries, a small thermal heterogeneity is introduced in the model centre. 


36,37,62 


We introduce this heterogeneity of triangular shape and 20km width by elevating 
the initial 1,350°C isotherm up to 10km before thermal equilibration®. All rheo- 
logical and thermal parameters are given in Extended Data Table 1. 

Analytical solution. Here we derive a transparent analytical solution for finite 
amplitude necking of a homogeneous viscous layer consisting of a power-law 
material. A horizontal layer of initial thickness Do is extended by a constant line 
force F that is applied parallel to the layer. In an incompressible, free layer the 
mean layer-parallel deviatoric stress 7 is half of the total stress, F/Do (ref. 67), that 
is, T= %2F/Dp. The deviatoric stress further relates to the strain rate through the 
power law é= Br" where the pre-exponential factor B and the stress exponent n are 
material parameters. Note that B is often considered to be temperature-dependent 
with B=A x exp(—E/(RT)). On the basis of these relations a characteristic viscosity 
Ne=T/é=7 ~"/B and a characteristic time t= ./T = 1/B x (‘4F/Dy)~" can be 
defined®. 

Owing to mass conservation and incompressible flow, the horizontal stretching 
of the layer has to be balanced by vertical thinning: 1/L x dL/dt=—1/D x dD/dt, 
where L is the length of the layer. The resulting extensional velocity v= dL/dt can 
thus be written as 


v=—L/Dx dD/dt (1) 


We assume that the upper and lower boundaries are traction-free and that no 
depth-dependent thinning occurs. These assumptions allowed derivation of 
closed analytical solutions for necking instabilities in boudinage mechanics and 
slab detachment® involving the time-dependent layer thickness D that can be 
expressed as: 


D=D)(1—t/t,)!/”) (2) 


where the time until rupture f, relates to the aforementioned characteristic time™ 
t. such that t, =t,/n. 

Combining equation (1) and equation (2) yields the formula for the time- 
dependent extension velocity: 


v=L/(nt,) x (1—t/t,)-! (3) 


To apply the analytical results to continental rifting, we use parameters that 
describe a typical rift configuration (lithospheric thickness 100 km; mean lith- 
ospheric temperature T= 600°C; applied tectonic force 8TN m_!; width of the 
necking zone L= 100km; duration of rifting t= 20 Myr). The stress exponents of 
lithospheric materials range between 3 and 4, hence a purely viscous lithosphere 
can be approximated by n= 3.5. However, this approach neglects the existence of 
brittle deformation that is evidenced at real plate boundaries by ubiquitous faulting. 
Brittle failure can be represented as an end-member of power law creep, if stress 
exponents up to 30 are used’”. The analytical solution is plotted in Fig. 3 for three 
cases: n= 3.5 (Al), n= 10 (A2), n= 30 (A3). Despite the simplicity of the analytical 
model, the numerical solutions of lithospheric necking are very similar to analytical 
solutions (Fig. 3). Hence, the analytical calculation corroborates our conclusion 
that it is the rapid loss of lithospheric strength during continental rifting, which is 
responsible for the abrupt increase of extension velocity. 
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Extended Data Figure 1 | South Atlantic Rift and the central North 
Atlantic Rift. a—h, The maps depict snapshots of the slow and fast rift 
phase in the South Atlantic Rift (a, b) and central North Atlantic Rift 


a Slow 


South Atlantic Rift 


b Fast 


135 Myr ago 


. Equ. SA 
ae 


NW Africa 


oO 
! 
s 
S‘America~ 
& 
Q 
(eS 


S Africa 
(fixed) 


120 Myr ago 


NW Africa 
Equ. SA 


S America 


\o. WS ‘yued 


—a 
2000 km 
Central North Atlantic Rift 
e Slow f Fast 
210 Myr ago 190 Myr ago 
N America N America 


NW Africa 
(fixed) 


— 
1000 km 


NW Africa 
(fixed) 


Full rift velocity 
[mm/yr] 


Margin formation rate 
[1000 km?/My] 


Full rift velocity 
[mm/yr] 


Margin formation rate 
[1000 km?/My] 


o 
2 x oO 
<8 <% eo 
ge 38 ESS 
5 3 clas 
Se ok age o 
$2 B28 5 BOS 
cs o> est 
om OG Ser 
#5 £5 ges 
Ys o£ “5 a = 
a eas =e 
60} C (a) a) (b)- 
‘ 2 
. a | 
. ® 
. © 
fot 
Yn 
shi d a Diachronous break-up 
Ee Centr. SA 
South. SA "es 
150 Equ.SA. fi 
w=, 1 
100 * Areas "74 Area: 
7.1-10° km? 1.4-10° km? 
50 4 
9) 


140 130 120 110 
Time before present [Myr ago] 


Syn-rift sedimenta- 
tion in N America***° 
and Africa** 
Increased sediment 

accumulation rate*® 


Area: 
1.6:105 km? 


Slow 


230 220 210 200 190 
Time before present [Myr ago] 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


(e, f). We corroborate the inferred velocity history with key temporal 
constraints***>78! from geological and geophysical observations (c, g). For 
animations of the kinematic evolution, see Supplementary Videos 1 and 2. 
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Extended Data Figure 2 | North America-Iberia Rift and the Australia- 
Antarctica Rift. a~h, The maps depict snapshots of the slow and fast 

rift phase in the North America-Iberia Rift (a, b) and the Australia- 
Antarctica Rift (e-h). We corroborate the inferred velocity history 
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with key temporal constraints**"*** from geological and geophysical 
observations (c, g). For animations of the kinematic evolution, see 


Supplementary Videos 3 and 4. 
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phase in the South China Sea opening (a, b) and the Gulf of California animations of the kinematic evolution, see Supplementary Videos 5 and 6. 
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northeast Atlantic opening. a-i, The maps depict snapshots of the slow observations (c, h). For animations of the kinematic evolution, see 
and fast rift phase in the North America—Greenland Rift (a, b) and the Supplementary Videos 7 and 8. 
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areas where the basement is interpreted to comprise exhumed mantle 

or seaward dipping reflectors, but not basement formed by sea-floor 
spreading processes. The alternative set of COB geometries, defining the 
blue, together with ‘point deep crustal seismic soundings linked together extreme landward limit of what basement that is not clearly continental 
by gravity modelling. Red points represent a mixture of sonobuoys and crust, is shown in yellow. Underlying image shows global free-air gravity 
deep reflection profiles. All references for displayed data are listed in field®. 

Supplementary Table 2. Our preferred set of COBs (green) includes some 


Extended Data Figure 5 | Data coverage for construction of COBs. 
We restrict our analysis to regions where seismic refraction data for both 
conjugate margins is available. Seismic refraction profiles are shown in 
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Extended Data Figure 6 | Results using alternative, continent-ward set of COBs. The COB set is shown in Extended Data Fig. 5 as yellow polygons. 
Breakup takes place earlier, yet the two-phase evolution is robustly represented in this end-member scenario. 
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Extended Data Figure 7 | Final margin structures of numerical 
experiments. Using model M1 as our reference model, we vary layer 
thickness, rheological flow laws, the thermal configuration, frictional 
softening, and thermal expansivity to compute models M2-M10. M5 uses 
a comparatively weak quartzite flow law”’. The final margin structures 
feature a wide range of rifted margin geometries reproducing all observed 
configurations of wide, narrow, symmetrical and asymmetrical margins. 
Depending on rheological evolution, extension is accommodated by 
brittle faults, ductile shear zones or both. For all cases, the associated 


time-dependent extension velocity (shown on the right in blue) exhibits 
the characteristic two-phase behaviour of slow rifting during the first 
rift phase, speed-up during lithospheric necking and fast rifting before 
breakup. Blue lines correspond to the final margin structures on the 

left and represent model runs where the boundary force coincides with 
the integrated strength of the yield strength profiles. Grey lines depict 
parameter variations where the boundary force is larger or smaller than 
the lithospheric strength resulting in two-phase velocities with an earlier 
speed-up, or decreasing rift activity reproducing failed rifts, respectively. 
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Extended Data Figure 8 | Alternative South Atlantic plate tectonic 
reconstructions. a, b, Several end-member models are shown that differ 
in terms of the timing of final South Atlantic breakup, and intra-plate 
deformation. a, Map view evolution. b, Frequency of extension velocity 
considering the entire South Atlantic (top) or only the Northern and 
Southern Plates (middle and bottom, respectively). Southern plates are 
depicted as bold polygons in the map view (a). Plate models”** with a final 
breakup at ~110 Myr ago depict a speed-up at 125-120 Myr ago, while 
models with a later breakup**” at ~100 Myr ago also involve a later rift 


acceleration at ~110 Myr ago. Reconstructions in which large intra-plate 
deformation*®*”* decouples northern and southern South America 
display first a speed-up of southern South America followed by a distinct 
speed-up of northern South America. Plate models with less internal 
deformation (for example, ref. 7) exhibit a minor acceleration of the 
southern plates followed by a large acceleration of entire South America. 
In all cases, plate kinematics show major speed-up about 10 Myr before 
breakup of the controlling rift segment. 
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Extended Data Table 1 | Thermo-mechanical reference parameters 


Parameter Units Gast rust Mantle Mantle 
Density kg m® 2700 2850 3280 3300 
Thermal expansivity 10°K" 27: 27, 3.0 3.0 
Bulk modulus GPa 55 63 122 122 
Shear modulus GPa 36 40 74 74 
Heat capacity Jkg'K" 1200 1200 1200 1200 
Heat conductivity WK'm" 25 2.5 3.3 3.3 
Radiogenic heat production uw m® 1.5 0.2 0.0 0.0 
Initial friction coefficient* - 0.5 0.5 0.5 0.5 
Cohesion MPa 5.0 5.0 5.0 5.0 
Rheology Wot Wet bidslg Dry we 
Quartzite  Quartzite —Anorthite Olivine Olivine 
Flow law reference bd Li 35 s sd 
Pre-exponential constant for diffusion creep Pa's" - - - 2.25e-09 1.5e-09 
Activation energy for diffusion creep kJ/mol - - - 375 335 
Activation volume for diffusion creep cm*/mol - - - 6 4 
Pre-exponential constant for dislocation creep Pa"s" 8.57e-28 1.54e-17  1.79e-15  6.51e-16 2.12e-15 
Power law exponent for dislocation creep - 4.0 2.3 3.0 3.5 3.5 
Activation energy for dislocation creep kJ/mol 223 154 356 530 480 
Activation volume for dislocation creep cm*/mol 0 ie} 0 13 11 


These listed parameters are used unless indicated otherwise. Pre-exponential constants of the flow laws®* ©®7° have been recalculated to account for flow laws written as a function of second invariants 


of stress and strain rate. 


*During frictional strain softening, the friction coefficient reduces linearly from 0.5 to 0.05 for brittle strain between O and 1. For strains larger than 1, it remains constant at 0.05. 
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Emergence of a Homo sapiens-specific gene family 
and chromosome 16p11.2 CNV susceptibility 


Xander Nuttle!*, Giuliana Giannuzzi**, Michael H. Duyzend!, Joshua G. Schraiber', Ifigo Narvaiza*, Peter H. Sudmant't, 
Osnat Penn!, Giorgia Chiatante*, Maika Malig!, John Huddleston!°, Chris Benner*, Francesca Camponeschi®, 

Simone Ciofi-Baffoni®’, Holly A. F. Stessman!, Maria C. N. Marchetto?, Laura Denman!, Lana Harshman!, Carl Baker', 
Archana Raja, Kelsi Penewit', Nicolette Janke!, W. Joyce Tang®, Mario Ventura*, Lucia Banci®’, Francesca Antonacci4, 
Joshua M. Akey!, Chris T. Amemiya®, Fred H. Gage*’, Alexandre Reymond? & Evan E. Eichler!® 


Genetic differences that specify unique aspects of human evolution 
have typically been identified by comparative analyses between the 
genomes of humans and closely related primates’, including more 
recently the genomes of archaic hominins”*. Not all regions of the 
genome, however, are equally amenable to such study. Recurrent 
copy number variation (CNV) at chromosome 16p11.2 accounts 
for approximately 1% of cases of autism*” and is mediated by a 
complex set of segmental duplications, many of which arose recently 
during human evolution. Here we reconstruct the evolutionary 
history of the locus and identify bolA family member 2 (BOLA2) 
as a gene duplicated exclusively in Homo sapiens. We estimate that 
a 95-kilobase-pair segment containing BOLA2 duplicated across 
the critical region approximately 282 thousand years ago (ka), one 
of the latest among a series of genomic changes that dramatically 
restructured the locus during hominid evolution. All humans 
examined carried one or more copies of the duplication, which 
nearly fixed early in the human lineage—a pattern unlikely to have 
arisen so rapidly in the absence of selection (P < 0.0097). We show 
that the duplication of BOLA2 led to a novel, human-specific in- 
frame fusion transcript and that BOLA2 copy number correlates 
with both RNA expression (r= 0.36) and protein level (r= 0.65), 
with the greatest expression difference between human and 
chimpanzee in experimentally derived stem cells. Analyses of 152 
patients carrying a chromosome 16p11.2 rearrangement show that 
more than 96% of breakpoints occur within the H. sapiens-specific 
duplication. In summary, the duplicative transposition of BOLA2 
at the root of the H. sapiens lineage about 282 ka simultaneously 
increased copy number of a gene associated with iron homeostasis 
and predisposed our species to recurrent rearrangements associated 
with disease. 

To reconstruct the evolutionary history of the chromosome 16p11.2 
region, we generated complete, reference-quality genome sequences® 
(Supplementary Table 1) for one orangutan, two chimpanzee and three 
human haplotypes (Fig. 1a and Extended Data Fig. 1). Comparison 
with mouse establishes the orangutan configuration as ancestral. In 
both humans and chimpanzees, the region has been independently 
restructured, nearly doubling in length primarily by the differential 
accumulation of segmental duplications (Fig. la and Extended Data 
Fig. la). We find six inversions have occurred in the African great 
apes within chromosome 16p11.2 (Extended Data Figs 2-4 and 
Supplementary Tables 2 and 3), a nonrandom clustering (P< 1x 10~), 
with breakpoints mapping near an ~20-kilobase-pair (kbp) low-copy 


repeat 16a (LCR16a) core duplicon. The core encodes a positively 
selected gene family (NPIP) that emerged on the human-African 
great ape lineage’. Only within the human lineage do large (>100 kbp) 
segmental duplications exist in a direct orientation flanking the autism 
critical region at breakpoint regions BP4 and BP5 (Extended Data 
Fig. 5a and Supplementary Table 4)%, implying that susceptibility to 
large-scale CNV associated with disease**? arose specifically within 
the human species. 

Structural differences between human haplotypes are largely 
restricted to integral changes in the copy number of a 102-kbp block 
within both the proximal and distal breakpoint regions (Extended Data 
Fig. 1b). This block is composed of two different segmental duplications 
originating from chromosome 16: a 72-kbp segment duplicated from 
chromosome 16p12.1 carrying NPIP and a portion of the SMG1 serine- 
threonine kinase gene (SMG1P) and a 30-kbp segment carrying three 
intact genes: BOLA2, SLX1 and SULT1A3 (Fig. 1a and Extended Data 
Fig. 1b). More than a dozen large-scale structural changes, including six 
duplicative transpositions (>830 kbp) from elsewhere on chromosome 
16, are required to reconcile the organization of human and chimpan- 
zee chromosome 16p11.2 (Extended Data Figs 3, 4 and Supplementary 
Table 3). Assuming a human-chimpanzee divergence time of 6 million 
years ago (Ma) (ref. 10) and a constant substitution rate, we estimate 
that a 95-kbp segment including BOLA2 duplicated across the critical 
region ~282ka (95% confidence interval 361-209 ka), around the time 
when H. sapiens emerged as a species'! (Figs 1b and 2a, Extended Data 
Fig. 6 and Supplementary Tables 5-7). 

We examined copy number diversity’* of the duplicated genes 
mapping to the 102-kbp cassette—BOLA2, SLX1 and SULT1A3—in 
humans, archaic humans and apes (Fig. 2b-c, Extended Data Fig. 7 and 
Supplementary Tables 8-10). We found that BOLA2 is duplicated in all 
H. sapiens individuals examined, including archaic representatives of 
Neolithic and Mesolithic populations", as well as the oldest sequenced 
archaic human, Ust’-Ishim, estimated to have lived 45 ka (ref. 14). In 
sharp contrast, BOLA2 is single copy (that is, diploid copy number = 2) 
in nonhuman primates and the archaic hominins Neanderthal’ and 
Denisova? (Fig. 2b—c and Supplementary Table 8), consistent with our 
phylogenetic point estimate of the duplication age. Human genomes 
contain from three to eight diploid BOLA2 copies, with at least one 
copy of the distal duplicate BOLA2B (range one to four copies; mean 
and median two) and at least two copies of the proximal ancestral 
BOLA2A (range two to five copies; mean and median four; Fig. 2c and 
Supplementary Table 8). 
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Figure 1 | Comparative sequence analysis of chromosome 16p11.2 
among apes and the evolution of BOLA2 duplications in humans. 

a, Genomic organization of chromosome 16p11.2 for one orangutan and 
one chimpanzee haplotype and the human reference haplotype (GRCh37 
chr16:28195661-30573128). Blocks of segmental duplications within 

this locus mediate recurrent rearrangements in humans and have thus 
been defined as breakpoint regions BP1-BP5 (ref. 8). Coloured boxes and 
thick arrows indicate the extent and orientation of segmental duplications 
(different colours denote duplicons from different ancestral genomic 
loci; hashed boxes indicate sequence duplicated in humans but not in 

the species represented). Thin numbered arrows show orientations of 
gene-rich regions of unique sequence. Red triangles indicate locations 
and orientations of NPIP cores. Numbers (left) indicate the size of each 
haplotype, with the number of segmentally duplicated base pairs shown 


in parentheses. For chimpanzee, the size is a lower bound owing to 

gaps (dotted line sections) and the contig not reaching unique region 1. 
Regions of human CNV (yellow highlight) occur on both sides of 

the critical region and involve the same 102-kbp unit: a 30-kbp block 
(green arrow) containing BOLA2, SLX1 and SULT1A3 and a 72-kbp 
block (orange arrow) harbouring SMG1P. Expansion and contraction 
of this cassette underlie hundreds of kilobase pairs of structural 
diversity between human haplotypes. b, A model for the emergence of 
BOLA2 duplications during H. sapiens evolution. It depicts structural 
changes over time leading to the present-day human architecture. A full 
evolutionary model detailing the dynamic evolution of chromosome 
16p11.2 in great apes is provided in the Supplementary Information and 
Extended Data Figs 3 and 4. 
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Figure 3 | BOLA2 expression analyses. a, BOLA2 mRNA expression 
quantifications”° in 366 LCLs from individuals genotyped for BOLA2 
paralogue-specific copy number. Points indicate expression levels and 
copy number (jittered) for each cell line; horizontal lines show mean 
levels for each copy number. Dashed line shows least squares regression. 
Point colours indicate BOLA2B copy number (pink, one copy; black, two 
copies; cyan, three copies). Groups with the same aggregate BOLA2 copy 
number but different combinations of paralogue-specific copy number 
do not exhibit differential expression, consistent with both paralogues 
producing mRNA. RPKM, reads per kilobase of transcript per million 
mapped reads. b, Plot layout as in a, but data show BOLA2 protein 
expression quantified by western blot densitometry on protein lysates 
from 34 LCLs. No evidence indicates differential protein expression of 
distinct BOLA2 paralogues. c, BOLA2 gene models, predicted protein 
products and support from RNA-seq data from human iPSCs. RT-PCR, 
cloning and capillary sequencing experiments identified three BOLA2 
isoforms: the canonical isoform (BOLA2, black) and two fusion isoforms 
consisting of the first two exons from canonical BOLA2 fused with three 


In light of its recent origin and its potential to promote disease-causing 
rearrangement, we considered it remarkable that 99.8% of humans 
carry four or more copies of this segment. Ancient humans such as 
Ust’-Ishim as well as some of the oldest branches of modern humans 
(for example, San and Biaka pygmy’») typically carry five or six 
copies, indicating that it spread rapidly early in human history. We 
modelled various evolutionary scenarios by simulation on the basis of 
the observed genotypes and a realistic model of human demographic 
history (Extended Data Fig. 8a), assuming neutral evolution!®'*. The 
observed genotypes or genotypes with higher BOLA2B frequencies 
only in humans were improbable (P< 0.0097; Extended Data Fig. 8b), 
even when the duplication age parameter was varied by an order of 
magnitude. Scenarios incorporating recurrent duplication were also 
deemed unlikely (P < 0.0062). We next implemented a model incor- 
porating the 282 ka age estimate but varying the selection coefficient 
(s) as an input parameter, yielding a maximum likelihood estimate of 
s=0.0015 (Extended Data Fig. 8c). Interestingly, the unique ~550-kbp 
critical region flanked by BOLA2 duplications showed signatures 
consistent with a region under positive selection: the absence of archaic 
introgression”, low diversity (bottom 2.7%) and an excess of rare variants 
(Extended Data Fig. 8d-e). 

Because humans show extensive CNV, we assessed whether copy 
number correlated with messenger RNA (mRNA) and protein levels. 
We found a significant correlation between BOLA2 copy number and 
expression at the RNA level from analysis of 366 lymphoblastoid cell 
lines (LCLs)”° (r=0.36, P=2.09 x 10~'?; Fig. 3a and Supplementary 
Tables 11 and 12) and at the protein level from analysis of whole- 
protein lysates from 34 LCLs (r=0.64, P=4.34 x 10->; Fig. 3b and 
Supplementary Tables 13 and 14). 


ESCs iPSCs NPCs Neurons 


exons from SMG1P. One fusion isoform (BOLA2F, blue) maintains the 
BOLA2 open reading frame well beyond the fusion junction, whereas a 
third isoform (BOLA2T, red) contains a premature stop codon within 
the first SMG1P-derived exon. Numbers next to curved lines indicate 
mean counts of RNA-seq reads from two human iPSCs (two independent 
clones each) supporting each exon-exon junction, with standard errors in 
parentheses. nt, nucleotides; aa, amino acids. d, RNA-seq quantification 
of BOLA2 (black) and BOLA2F (blue) mRNA expression through in vitro 
differentiation of primate iPSCs into neurons. Data from two human 

and two chimpanzee cell lines (two independent clones each, except 

for neurons) reveal higher levels of BOLA2 transcripts in human iPSCs 
than in chimpanzee iPSCs and that BOLA2 RNA levels decrease through 
neuronal differentiation. Bar heights indicate mean expression levels for 
each species and differentiation stage in transcripts per million (TPM); 
error bars, s.e.m. BOLA2 expression in human embryonic stem cells 

(two cell lines) is consistent with data from human iPSCs. ESCs, 
embryonic stem cells; NPCs, neural progenitor cells. 


iPSCs NPCs Neurons 


We also performed reverse transcription PCR (RT-PCR) and iden- 
tified an alternative gene structure composed of the first two exons 
from BOLA2 joined with three novel 3’ exons from an older segmental 
duplication containing SMG1P (Fig. 3c). This fusion isoform contains 
an open reading frame predicted to encode a 217-residue protein, 
including 53 residues from BOLA2 and 164 residues from SMG1P. 
Both canonical and fusion transcripts are co-expressed in a wide variety 
of tissues and developmental stages (Extended Data Fig. 9). Although 
the predicted fusion protein cannot be detected by existing antibodies, 
it is interesting that ribosome profiling data provide evidence that the 
mRNA is translated (Supplementary Table 15). Importantly, since 
the ancestral BOLA2 at BP5 lacked the SMG1P duplication down- 
stream, the origin of the fusion product must have coincided with the 
juxtaposition of BOLA2 and SMG1P by the tandem 102-kbp segmental 
duplication ~650-300 ka at BP5. We conclude that this fusion isoform 
is H. sapiens-specific. 

BOLA2 was previously identified as one of the top 50 genes differ- 
entially expressed between humans and nonhuman apes in induced 
pluripotent stem cells (iPSCs)*', implying that this gene might be par- 
ticularly relevant early in development. On the basis of our character- 
ization of the different BOLA2 isoforms, we revisited this observation 
by quantifying BOLA2 mRNA levels by RNA sequencing (RNA-seq) 
in human and chimpanzee iPSCs, iPSC-derived neural progenitor 
cells and 8-week-old neurons. Remarkably, we found the greatest 
differences in canonical BOLA2 expression at the iPSC state (twofold) 
and to a lesser extent in neural progenitor cells (1.5-fold) (Fig. 3d and 
Supplementary Table 16). Quantification of BOLA2 expression in two 
primary human embryonic stem cell lines revealed transcript levels 
comparable to human iPSCs (Fig. 3d and Supplementary Table 16). 
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Figure 4 | Refinement of chromosome 16p11.2 rearrangement 
breakpoints. a, WGS results for a family with a de novo chromosome 
16p11.2 microdeletion in a child with autism. Normalized read depth 
at unique 30-mer positions in the human reference genome GRCh37 
is depicted for the proband, her mother and her father. Read-depth 


In contrast, examination of a panel of adult tissues”? revealed no sub- 


stantial differences in BOLA2 mRNA levels between human and chim- 
panzee (Extended Data Fig. 9d). As expected, expression of the fusion 
BOLA2-SMGIP transcript was detected exclusively in human. 

The duplication of BOLA2 across the critical region expanded 
threefold the size of flanking high-identity, directly oriented sequence 
blocks (Extended Data Fig. 5a—b and Supplementary Tables 4, 17 
and 18), theoretically predisposing the locus to recurrent CNV via 
unequal crossover (Extended Data Fig. 5c) specifically in the human 
lineage. To test this, we refined breakpoint locations in patients with 
autism and developmental delay carrying either the chromosome 
16p11.2 microduplication or microdeletion event?’. Using whole- 
genome sequence (WGS) data and a molecular inversion probe 
(MIP) assay”4, we localized breakpoints in 152 patients correspond- 
ing to 105 independent rearrangement events (Fig. 4a, Extended Data 
Fig. 10 and Supplementary Table 19). We found 96% (101 out of 105) 
of the disease-causing rearrangement breakpoints map within the 
H. sapiens-specific duplication containing BOLA2 (Fig. 4b). Thus, the 
expansion of this segment rendered the chromosome 16p11.2 locus 
susceptible to recurrent rearrangement. 

In summary, the level of genetic difference between humans and 
chimpanzees for chromosome 16p11.2 stands in sharp contrast to the 
oft-quoted 99% genetic identity between the species. The region has 
undergone extensive inversion and duplication, including a 95-kbp 
segment containing BOLA2 that duplicated after our divergence with 
ancient hominins. This event contributes more derived sequence 
specific to H. sapiens than 35,500 previously reported human- 
specific single-nucleotide variants and indels combined’. The rapid 
rise and dispersal of this duplicated segment at the root of H. sapiens 
(~282 ka) are unlikely to have occurred under neutral evolution but 
rather are consistent with modest positive selection (s=0.0015). The 
estimated strength of selection on the BOLA2 duplication is an order 
of magnitude weaker than what is typically observed for recent pos- 
itive selection (such as the emergence of lactase persistence ~10 ka 
(ref. 25)) but an order of magnitude stronger than nearly neutral 
mutations. Remarkably, the BOLA2 duplication rapidly rose to high 
frequency in humans despite predisposing our species to recurrent 
CNV associated with disease. The expansion of this segment resulted in 
the formation of a novel fusion transcript and dramatic BOLA2 expres- 
sion differences between chimpanzee and human iPSCs. Although the 
phenotypic consequences of increased BOLA2 expression and the novel 
fusion transcript await future in vivo characterization, it is known that 
BOLA2 physically interacts in a heterotrimeric complex with GLRX3 
(glutaredoxin 3)°. This complex is conserved from prokaryotes to 
humans”’ and was shown to have a role in iron sensing in yeast”*®. In 
vertebrates, BOLA2 has been hypothesized to play important roles in 
iron regulation” and iron-sulfur protein biogenesis*’. We speculate 
that the expansion of this conserved gene may enhance iron utilization 
and homeostasis, especially during human embryonic development. 
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signatures reveal a deletion in the proband extending between but not 
beyond the H. sapiens-specific duplicated sequences (highlighted in 
pink). b, Summary of results across 105 independent microdeletion and 
microduplication events from 152 individuals; ~96% of breakpoints map 
to the H. sapiens-specific segmental duplication. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

Single-molecule, real-time sequencing was used to generate high-quality 
sequence® from bacterial artificial chromosome clones obtained from genomic 
libraries. Clone sequences were assembled using HGAP and error-corrected 
using Quiver*!. Contig assembly was performed using Sequencher (Gene Codes 
Corporation, Ann Arbor, Michigan) and validated by FISH. Copy number 
genotyping of genes and segmental duplications was performed using a read-depth 
method” and WGS data from humans*?**, nonhuman primates*4 and archaic 
genomes”?!?14, as well as single-molecule MIPs* targeted to paralogous sequence 
variants”*, We estimated evolutionary timing of segmental duplication events on 
the basis of comparative sequencing and phylogenetic analyses (neighbour-joining 
method), adjusting branch lengths for trees that failed the Tajima’s relative rate 
test and assuming divergence times of 6 Ma (human-chimpanzee)'° and 15 Ma 
(human-orangutan). Evolutionary conservation analysis of BOLA2 was performed 
by maximum likelihood (PAML). Likelihoods of BOLA2B fixation under different 
scenarios were assessed using the coalescent simulators ms!” and msms'®, adapt- 
ing a previously published demographic model!®. BOLA2 copy number estimates 
were correlated (Pearson's r) using RNA-seq quantifications”’ (PEER-normalized 
RPKM) and western blot BOLA2 densities in human LCLs grown in complete 
RPMI medium and lysed in RIPA buffer. After SDS-PAGE and transfer to PVDF 
membrane, blots were incubated with an anti-BOLA2 antibody (Santa Cruz 
Biotechnology, Dallas, Texas) and an anti-actin antibody (Sigma) for normalization 
purposes. Band densities were quantified using the BiolD software. BOLA2 coding 
DNA sequence (CDS) was cloned using the Gateway system (Invitrogen, Carlsbad, 
California). HeLa cells were transfected with cytomegalovirus-BOLA2 CDS (both 
10 and 17 kDa forms) and analysed by western blotting. BOLA2 gene models were 
established via RT-PCR, cloning and capillary sequencing. RNA-seq data were 


generated from previously described embryonic stem cell and iPSC lines”, as 
well as iPSC lines differentiated into neural progenitor cells and neurons. BOLA2 
mRNA expression was quantified in transcripts per million with Kallisto** (version 
0.42.1) using a custom catalogue of transcripts including all human RefSeq tran- 
scripts with the three BOLA2 isoforms. Breakpoints of chromosome 16p11.2 rear- 
rangements were refined using Illumina whole-genome shotgun sequencing*”** 
and single-molecule MIP analysis”**>*” of patient DNA obtained from the Simons 
VIP” and Simons Simplex Collection®’. All procedures for clinical assessment and 
blood extraction were approved by the institutional review boards of participating 
institutions, and informed consent was obtained for participation in this research. 
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Extended Data Figure 1 | See next page for caption. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Extended Data Figure 1 | Comparative sequence analysis of 
chromosome 16p11.2 among apes. a, Genomic organization of 
chromosome 16p11.2 for one orangutan and two chimpanzee haplotypes 
and the human reference haplotype (GRCh37 chr16:28195661-30573128; 
see ideogram for approximate chromosomal location). Blocks of segmental 
duplications within this locus mediate recurrent rearrangements in 
humans; thus, these blocks have been defined as breakpoint regions 
BP1-BP5 (ref. 8). The ~550-kbp critical region (pink) and a >1-Mbp 
chimpanzee-specific inversion polymorphism (orange) are highlighted. 
Tiling paths of sequenced clones are indicated above each haplotype, with 
chimpanzee clones that could not be fully resolved marked with asterisks. 
Coloured boxes and thick arrows indicate the extent and orientation of 
segmental duplications (with different colours denoting duplicons from 
different ancestral genomic loci and hashed boxes indicating sequence 
duplicated in humans but not in the species represented). Thin numbered 
arrows show orientations of gene-rich regions of unique sequence. 
Numbers (left) indicate the size of each orthologous haplotype, with 

the number of segmentally duplicated base pairs shown in parentheses. 


Note that, for chimpanzee, these sizes are lower bounds owing to gaps 

in the contigs (dotted line sections) and the contigs not reaching unique 
sequence beyond BP1 (that is, unique region 1). b, Distinct human 
structural haplotypes over the chromosome 16p11.2 critical region and 
flanking sequences (three complete haplotypes extending from unique 
sequence distal to BP3 to unique sequence proximal to BP5 and one partial 
haplotype including BP3-BP4 and BP5 sequence contigs). High-quality 
sequence for each haplotype was generated by sequencing a total of 

40 bacterial artificial chromosomes and 15 fosmids from three different 
human genomic libraries. Regions of CNV (highlighted in yellow along the 
first two haplotypes) occur on both sides of the critical region and involve 
the same 102-kbp unit in direct orientation, including a 30-kbp 

block containing BOLA2 and two other genes and a 72-kbp block 
harbouring a partial segmental duplication of SMG1 (SMG1P). Expansion 
and contraction of this cassette underlie hundreds of kilobase pairs of 
structural diversity between human haplotypes. BOLA2 paralogue-specific 
copy number genotype data suggest that H1 and H3 probably represent the 
most common haplotype structures in humans. 
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Extended Data Figure 2 | See next page for caption. 
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Extended Data Figure 2 | Comparison of chromosome 16p11.2 
structure between apes. a, Sequences (thin horizontal lines) from human 
(GRCh37 chr16:28195661-30573128) and orangutan (contig sequence) 
at chromosome 16p11.2 are compared using Miropeats (s = 1,000) and 
annotated with locations of human segmental duplications and FISH 
probes used to validate the organization of the region. Lines connecting 
the sequences show regions of homology, and line colours highlight 
differences in the order and orientation of distinct gene-rich regions 

of unique sequence across the locus (numbered 1-6). Numbers below 
FISH probes correspond to numbers within the images on the right, 
specifying which probes were used in each experiment. Experiment 1 
used the same probes as experiment 3, and experiment 2 used the 

same probes as experiment 4. Three-colour interphase FISH on human 
and orangutan chromosomes confirms the accuracy of our assembled 
orangutan contig. b, Sequences (thin horizontal lines) from human 


(GRCh37 chr16:28195661-30573128) and two chimpanzee structural 
haplotypes at chromosome 16p11.2 are compared using Miropeats 

(s = 1,500) and annotated with locations of human segmental duplications 
and FISH probes used to validate the organization of the region. Thick 

red horizontal lines indicate gaps in the chimpanzee contigs, and black 
boxes correspond to chimpanzee-specific segmental duplications (that is, 
sequences not duplicated in humans). Lines connecting the sequences 
show regions of homology, and line colours highlight differences in the 
order and orientation of distinct gene-rich regions of unique sequence 
across the locus (numbered 2-6). Numbers below FISH probes correspond 
to numbers within the images on the right, specifying which probes were 
used in each experiment. Grey rectangles show mapping locations of 
FISH probes in human. Three-colour interphase FISH on chimpanzee 
chromosomes confirms the accuracy of our assembled contigs. 
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Extended Data Figure 3 | Dynamic evolution of human chromosome 
16p11.2. a, A model for the evolution of the chromosome 16p11.2 BP1-BP5 
region® during great ape evolution. The schematic depicts structural 
changes over time leading to the present-day human architecture (see 
Supplementary Information for details). The orangutan structure (top) is 
largely devoid of segmental duplications and deemed to represent the ape 
ancestral organization because it is conserved with mouse. Subsequent 
steps were inferred on the basis of phylogenetic reconstruction, origins of 
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the duplicated sequences and the most parsimonious path with respect to 
changes in gene order (inversions). (See Supplementary Information for 
a detailed discussion of all supporting evidence and confidence levels for 
each step.) Note that, without access to genomes containing intermediate 
chromosome 16p11.2 structures, it is impossible to know with certainty 
the entire step-by-step evolutionary history. Some details presented here 
may not be accurate. mya, million years ago; kya, thousand years ago. 
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Extended Data Figure 5 | See next page for caption. 
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Extended Data Figure 5 | Comparison of duplications around the 
chromosome 16p11.2 autism critical region among apes and nonallelic 
homologous recombination (NAHR) model underlying CNV at human 
chromosome 16p11.2. a, Local directly oriented (green) and inversely 
oriented (blue) intrachromosomal segmental duplications flanking the 
chromosome 16p11.2 autism critical region (purple) are visualized using 
Miropeats (s = 1,000). Gaps in the chimpanzee C1 contig are shown in red. 
The smaller size (<50 kbp) and lower average sequence identity 

(at most 98.6%) of directly oriented duplications flanking the critical 
region in chimpanzee compared with human haplotypes including BOLA2 
on both sides of the critical region (at least 147 kbp of directly oriented 
duplications having at least 99.3% average sequence identity) suggest that 
susceptibility to NAHR resulting in microdeletions and microduplications 


at this locus evolved specifically in humans. b, Perfect sequence identity 
tract lengths (>500 bp) within directly oriented duplications flanking the 
critical region for human versus chimpanzee. Histograms show counts of 
tracts of perfect sequence identity (lacking single-nucleotide variants and 
indels) between directly oriented segmental duplications of interest within 
each indicated haplotype and the distribution of these tracts over different 
size ranges. Human haplotypes having BOLA2 on both sides of the critical 
region (H1 and H3) contain the highest number of such tracts and the 
longest such tracts, including one tract spanning 10,774 bp. In contrast, the 
longest tract of perfect sequence identity between duplications of interest 
in chimpanzee (considering both the C1 and C2 haplotypes) spans 450 bp. 
c, NAHR model underlying normal and disease-associated CNV at human 
chromosome 16p11.2. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


= 
f=) 
Oo 


2 
© 
© 


— SULT1A4 
SLX1B4SULT1A4. — 


Sequence identity 
°o ° 
g 8 


SMG1P2 - 


= 
© 
ro>) 


LETTER 


» 30 kbp block 


mum) 72 kbp block 
mm = 45 kbp block 


—al NPIP core 
Vv SMG1P-BOLA2 junction 


100000 150000 
Alignment Coordinate 


i 0.99 
SMG1P5 __| 


= PIP core I 
interlocus gene > 
conversion (36 kbp) 


Sequence identity 


| | 
SMG 4 P56, ef 


—— ——————— 


——= PIP core i 


1.00 
2 0.99 
i= 
® 
2 0.98 
0 30000 60000 90000 120000 g 
Alignment Coordinate = 
5 0.97 
log 
oD 
d 4 ® 0.96 
7 kbp duplicated within 95 kbp duplicated from 
BP4 (replicated twice) BPS5 to BP4 
A 
1 ss —_—__—_—+—-——1n 
—> => >_> => 
||| 
BP4 BPS 
e branch lengths 0.000008 
average branch length for all sequences in clade 0.000043 [55 5 neo H2., BOLABA 
Le hi H2 
aoNGST|’*0.000005 0.000782 THTMAN Me 
~282 kya 80g. op0167 0.000052 human H1, 
x 0.000042 |“ 2000000 human H3, | BOLA2B 
a 
3000745 240 Oe eh 
Doossae |? 0.000258 oooo137 0000000 T 
Inu au Pe = human H3,,, 
962" 0.000078 RURGAES: 
oe a 008° 9. o00004 “| BOLA2A 
0.001047 |" 0.000079 ano H1, 
0.000017 “Doom human H4,, 
019681 0.004950 oon ee C2 
p000T85- chimpanzee C1 
0.006584 ; 
gorilla 
0.020624 cranaitan 


T = ((0.000258 subs/site)/(0.000258+0.005536+0.004950+0.000213 subs/site))*(12 million years) = ~0.282 million years 


Extended Data Figure 6 | See next page for caption. 


0) 20000 


200000 


ah lp 
ap > —___—s mimo mip mp — 


Putative duplication breakpoint & 


40000 
Alignment Coordinate 


60000 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Extended Data Figure 6 | Sequence refinement of interspersed BOLA2 
duplication breakpoints, inference of BOLA2 duplication mechanism 
and phylogenetic BOLA2 duplication timing. a, H1 human BP4 
sequence (orange, green, orange and blue arrows in inset) was aligned 

to its allelic (black arrows in inset) and paralogous (red arrows in inset) 
counterparts. The sequence identity for each alignment was computed 
and plotted over 2-kbp windows, sliding by 100 bp. Black lines indicate 
sequence identity for allelic comparisons, whereas red lines correspond to 
paralogous comparisons. While the allelic comparisons exhibit uniform, 
near-perfect sequence identity across the entirety of the alignment, 
paralogous comparisons reveal three distinct levels of sequence identity, 
with the highest level in the middle. This pattern suggests that the 
BOLA2 duplication (highest-identity region, 95 kbp) landed within an 
evolutionarily older segmental duplication having paralogues at BP4 and 
BP5. Dashed vertical lines (numbered i-iv) indicate putative breakpoints 
for events that occurred after this older segmental duplication. Junction 
sequence from the BP5 102-kbp tandem duplication (that is, the SMG1P- 
BOLA2 junction) was clearly included in the 95-kbp duplication from BP5 
to BP4. b, Alignment of BP4 sequences containing the putative left (red 
arrows in inset) and right (dark blue arrows in inset) BOLA2 duplication 
breakpoints to the BP5 paralogue associated with the evolutionarily older 
segmental duplication (orange and light blue arrows in inset) and sliding 


window sequence identity analysis supports the hypothesis outlined above. 


Sequence identity lines for comparisons involving left and right BP4 
sequences intersect in the vicinity of the hypothesized BOLA2 duplication 
breakpoints. Comparing this result with the same analysis of the human 
H2 BP4 sequence lacking BOLA2 (green arrows in inset and green 
identity line) suggests this BP4 sequence represents the ancestral state of 
BP4 before the BOLA2 duplication arrived. Thus, two levels of sequence 
identity existed between BP4 and BP5 before the BOLA2 duplication, 
consistent with an interlocus gene conversion event. c, Alignment of 

BP4 sequences (orange arrows in insets) containing the putative BOLA2 
duplication breakpoints to their ancestral BP4 (top plot) and their 


ancestral BP5 (middle plot) counterparts and sliding window sequence 
identity analysis reveals an ~7-kbp window (highlighted in orange) 
defining the BOLA2 duplication breakpoints. Analysis of the underlying 
multiple sequence alignment (Supplementary Table 5) identified positions 
with signatures informative for breakpoint localization (blue vertical lines, 
left BP4 72-kbp block outside the BOLA2 duplication and right BP4 
72-kbp block within the BOLA2 duplication; yellow vertical lines, left BP4 
72-kbp block within the BOLA2 duplication and right BP4 72-kbp block 
outside the BOLA2 duplication). Grey vertical lines indicate positions 
showing signatures of interlocus gene conversion. As both left and 

right 72-kbp block BP4 sequences within the ~7-kbp window are more 
highly identical to ancestral BP4 sequence (20/24 informative positions 
match the ancestral BP4 sequence) than to ancestral BP5 sequence, 

it is likely that this interval was involved in the BOLA2 duplication 

but duplicated only within BP4. Its boundaries define the most likely 
BOLA2 duplication breakpoints, and this pattern of sequence identity 
suggests a template-switching replicative mechanism as most probably 
underlying the BOLA2 duplication event. d, Template-switching model 
for the formation of BOLA2B. This mechanism was inferred from the 
sequence identity analyses in a-c and from analysis of a multiple sequence 
alignment (Supplementary Table 5). e, Phylogenetic characterization of 
the 95-kbp duplication containing BOLA2 from BP5 to BP4. Cladogram 
representation of an unrooted neighbour-joining phylogenetic tree based 
on a 21,102-bp multiple sequence alignment spanning BOLA2 and most 
of the 30-kbp block including human sequences from BP4 and BP5 

and single-copy orthologous sequences from chimpanzee, gorilla and 
orangutan. Branch lengths (substitutions per site) are shown on each 
branch (black decimal numbers), and bootstrap support is indicated (black 
integers at nodes). Blue numbers correspond to nodes and indicate average 
branch lengths for all sequences in corresponding clades. Branch lengths 
were used to estimate the time corresponding to the 95-kbp duplication 
containing BOLA2 from BP5 to BP4 as shown. 
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Extended Data Figure 7 | See next page for caption. 
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Extended Data Figure 7 | Analyses of BOLA2 aggregate and paralogue- 
specific CNV in humans. a, Interphase FISH confirms both BOLA2A 

and BOLA2B show CNV. Previous interphase FISH analysis (data not 
shown) suggests the individual NA20127 has six total copies of BOLA2. 
Diagram outlines a three-colour FISH assay including two probes (blue, 
green) targeting sequences within the autism critical region and one probe 
(red) targeting ~18-kbp of sequence (including BOLA2) over the 30-kbp 
duplication block. Signals from the red probe are detected on the telomeric 
(BP4) and centromeric (BP5) sides of the critical region (adjacent to the 
blue and green probes, respectively) on both chromosome 16 homologues. 
However, the red probe signal intensity is strongest adjacent to the green 
probe for one homologue but, in contrast, is strongest adjacent to the 

blue probe for the other chromosome 16 homologue, consistent with 
higher BOLA2A copy number in the first case and higher BOLA2B copy 
number in the second case. These data indicate that individual NA20127 
has three copies each of BOLA2A and BOLA2B. This differential signal 
intensity pattern does not result from an inversion of the chromosome 
16p11.2 critical region in this individual, as data from another FISH 
experiment (data not shown) refute this possibility. Information on 
probes used in these FISH experiments is provided in Supplementary 
Table 2. b, Interphase FISH experiments using a probe targeting BOLA2 
and surrounding sequence for individuals having the lowest (three) 

and highest (eight) confirmed aggregate BOLA2 copy numbers. ¢, Left 
and middle schematics detail three distinct sectors of the 72-kbp blocks 
(orange arrows). Each block has paralogous sequence variants that are 
informative for particular region(s) compared with others in chromosome 


16p11.2. These markers are colour-coded into three sectors within the 
72-kbp block of paralogy (a 59-kbp sector, blue and red boxes; a 7-kbp 
sector, green and orange boxes; and a 6-kbp sector, purple and yellow boxes), 
indicating which particular regions they distinguish. Right schematic 
shows known haplotype structures for individual NA12878. d, Analysing 
WGS data from NA12878 yields copy number estimates for BOLA2A and 
BOLA2B that match the known BOLA2 paralogue-specific copy number 
(PSCN) for this individual. Each point shows a relative marker-specific 
read count frequency (y axis) and its position within the duplication 
blocks (x axis). Point colours correspond to different marker sets for each 
sector, as diagrammed in c. Fractions indicate the relative copy number of 
each marker set. Estimates of 4/6 (red marker set) versus 2/6 (blue marker 
set) for the 59-kbp sector confirms the sequenced architecture (c) with an 
aggregate of four BOLA2 copies, and the estimate of 3/6 (orange marker 
set) confirms three copies of BOLA2A. WGS analysis also yields accurate 
PSCN estimates for the 45-kbp block. e, Using MIPs, we employed the 
same relative read-depth strategy. Genotyping results for the same sample 
as in d are shown, with additional markers (points not colour-coded as 

in cand d) added on the basis of polymorphic variants (symbols indicate 
different patterns of presence/absence among 72-kbp blocks, considering 
all such blocks from our four contiguous human haplotypes). MIP 
genotypes confirm WGS estimates (in d). f, BOLA2 PSCN genotypes 
(points, jittered around their integer values for clarity) were inferred from 
MIP sequence data for 894 humans. Numbers indicate total counts of 
individuals in each population having a particular BOLA2 PSCN genotype. 
Low-confidence estimates were excluded. 
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Extended Data Figure 8 | See next page for caption. 
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Extended Data Figure 8 | Population genetic modelling of the BOLA2B 
duplication and critical region analyses. a, Demographic model 
(adapted from ref. 16) used to simulate BOLA2B evolution under different 
scenarios. Nanc, effective population size of Homo ancestor, 21,600. Narc, 
effective population size of Neanderthal-Denisova ancestor, 500. Naum 
effective population size of human ancestor, 24,000. Nye, effective size of 
Yoruban population after expansion, 45,000. Npen, effective population 
size of Denisova, 500. Nnga; effective population size of Neanderthal, 

500. Nyri effective size of extant Yoruban population, 10,000. Nsan; 
effective size of extant San population, 10,000. T), time of archaic hominin 
divergence from modern humans, 650,000 years. T>, time of Neanderthal- 
Denisova divergence, 525,000 years. Tayp, time of formation of BOLA2B, 
282,000 years. T3, time of Yoruban-San divergence, 200,000 years. T4, 

time of Yoruban population expansion, 157,500 years. Ts, time of Yoruban 
population decline, 37,500 years. b, Simulation results (n = 1,000,000) 
assuming that the duplication that formed BOLA2B occurred once, 

282 ka, along the modern human ancestral lineage and evolved under 
neutrality compared with the observed genotype frequencies of BOLA2B 
in 8 San and 110 Yoruban haplotypes. Nearly all (999,531) simulations 
resulted in BOLA2B being lost from both populations; results from the 
remaining 469 simulations (black) are shown alongside the observed data 
(red, circled). Under this simple neutral model incorporating BOLA2B 
age, the observed BOLA2B frequency is never approached. c, Simulation 
was repeated exploring a range of selection coefficients from 0.0009 to 
0.0024 (increments of 0.0001), and the relative probability of the observed 
data under each scenario was calculated as the proportion of simulations 


yielding the observed BOLA2B genotypes among simulations where 
BOLA2B was not lost relative to the maximum such proportion for any 
single selection coefficient considered. The maximum likelihood estimate 
for the selection coefficient was s = 0.0015. Smoothed line is the LOESS 
regression curve. d, Low average heterozygosity of the chromosome 
16p11.2 BP4-BP5 critical region. Distribution of average heterozygosity 
values for 100,000 ~550-kbp regions of unique sequence randomly 
sampled with replacement from the autosomal genome compared with 
average heterozygosity values for the critical region (black line) and 
flanking unique sequences (coloured lines). The critical region lies in 

the bottom 2.6% of the distribution, showing low diversity consistent 
with potential positive selection. Bottom schematic indicates locations of 
the critical region and flanking unique regions in relation to segmental 
duplications across the locus—note that BOLA2A is located at BP5 and 
BOLA2B at BP4. Telo, telomeric; Centro, centromeric. e, Low Tajima’s D 
score for the chromosome 16p11.2 BP4—BP5 critical region. Distribution 
of Tajima’s D scores for 2,987 non-overlapping ~550-kbp regions across 
the genome compared with Tajima’s D scores for the critical region (black 
line) and flanking unique sequences (coloured lines). The critical region 
lies in the bottom 2.7% of the distribution, consistent with possible 
positive selection. The distribution is centred near —2 rather than 0 
because most single-nucleotide variants in the 1000 Genomes Project data 
set are rare variants having arisen during the large expansions of human 
populations over the past 100,000 years. Bottom schematic indicates 
locations of the critical region and flanking unique regions in relation to 
segmental duplications across the locus. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


By iti 
nt 
\ 
na 
! 
By wisi 
! 
| 
! 
! 
Dy viii 


|| 
L 
r 
L 
1 
L| 
i 
L| 
I 
I 


So 


(Wd) uowssesdx3 


10 
i?) 


2 ° 
+ cS) 


17 kDa 
10 kDa 


BOLA2 


Heart 


Cerebellum 


a 
Liver 


T T 
° ° fo} 
So ive) 
a = 


(WdL) uolssedxg 


Extended Data Figure 9 | See next page for caption. 
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Extended Data Figure 9 | BOLA2 expression and antibody validation. 

a, RT-PCR expression profile for canonical BOLA2. The expected product 
size for canonical BOLA2 (838 bp) was observed in all eight human tissues; 
1kb + DNA ladder (Thermo Fisher). b, RT-PCR expression profile for 
BOLA2-SMGI fusion product. The expected product size for the BOLA2 
fusion transcript (1,239 bp) was observed as a doublet in all tissues 

except skeletal muscle. Intensity of upper band differs between tissues; 
1kb + DNA ladder (Thermo Fisher). c, BOLA2 RNA-seq expression 
analysis. Canonical (BOLA2) and fusion transcripts (BOLA2F, BOLA2T) 
were assessed across 25 humans from GTEx RNA-seq data. Bar heights 
indicate mean expression levels for each tissue in transcripts per million 
with standard errors shown (error bars). Colours correspond to different 
BOLA2 isoforms as indicated. d, BOLA2 expression among primates 

in six adult tissues. Each point indicates a BOLA2 expression estimate 
from a single tissue sample, with samples obtained from a total of 


18 humans, 6 chimpanzees and 3 bonobos. Open circles correspond to 
individuals analysed in a single experiment, while closed shapes denote 
data from multiple experiments involving the same individual, with 

each distinct colour plus shape pattern showing all experiments for a 
particular individual. Horizontal lines show mean expression values for 
each species and tissue. Combined with our expression analyses of iPSCs, 
these data show BOLA2 expression differs substantially between human, 
chimpanzee and bonobo only in stem cells. e, Western blotting of HeLa 
cells transfected with the human BOLA2 annotated CDS and probed with 
an anti-BOLA2 antibody (Sc-163747). Whole-cell lysates of HeLa cells 
non-transfected with the overexpression construct (lane 1) and transfected 
with the human BOLA2 annotated CDS (lane 2) were probed with anti- 
BOLA2 antibody. Two bands with molecular weights of 10 and 17 kDa are 
identified, are more abundant in transfected cells and correspond to two 
BOLA2 protein isoforms arising from different translation start sites. 
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Extended Data Figure 10 | See next page for caption. 
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Extended Data Figure 10 | Chromosome 16p11.2 rearrangement 
breakpoint refinement. a, NAHR between directly oriented segmental 
duplications at BP4 and BP5. This unequal crossover results in 
chromosome 16p11.2 microdeletions and microduplications (Extended 
Data Fig. 5c). Coloured arrows and boxes correspond to duplication blocks 
and sectors within them are colour-coded as in Extended Data Fig. 7c. 
Unequal crossover could occur in eight distinct regions with regard to 
duplication block and sector boundaries. Three such regions are located 
within the ~95-kbp H. sapiens-specific duplication (dashed lines). Only 
unequal crossover events outside the H. sapiens-specific duplication 
produce recombinants which have a sector with non-uniform marker- 
specific copy number across its extent. b, Relative marker-specific 

read count frequencies (points) determined from WGS analysis for a 
microdeletion proband. Fractions indicate relative marker-specific copy 
numbers, as in Extended Data Fig. 7d, and diagrams adjacent to the plot 
show inferred haplotype structures for each chromosome 16 homologue 
for this individual. Although the data in the plot provide only diploid 
genotypes (and not resolved haplotypes), the haplotypes suggested here 
reflect this genotype information together with data from the parents 

(not shown) and the assumption (supported by our PSCN data) that 
haplotypes which have two BOLA2A copies and a single BOLA2B copy 
are the most common. Because marker-specific copy number is uniform 
across each sector, unequal crossover breakpoints must have occurred 
within the H. sapiens-specific duplication. c, Breakpoint refinement based 
on MIP PSCN marker data. Plots show relative marker-specific read count 
frequencies (points) determined using MIPs for a typical microdeletion 


patient (left) and a typical microduplication patient (right). Shapes and 
colour code designate different markers, and fractions indicate relative 
marker-specific copy numbers (as in Extended Data Fig. 7). Because 
marker-specific copy number is uniform across each sector for both 
individuals, in both cases, unequal crossover breakpoints must have 
occurred within the H. sapiens-specific duplication. d, Data from an 
atypical patient where the breakpoints are inferred to map outside the 

H. sapiens-specific segmental duplication. The plots show paralogue- 
specific copy number for a chromosome 16p11.2 microdeletion proband, 
his sibling and his mother over a 45-kbp duplication block shared between 
BP4 and BPS. Paralogue-specific copy number was estimated using a 

MIP assay targeting 54 informative markers over this region, with data 
from 43 markers fixed among haplotypes H1-H4 shown (points). Dashed 
lines indicate calls inferred using an automated caller, which were also 
confirmed by visual inspection. Adjacent schematics indicate the inferred 
haplotypes for each individual on the basis of these data, with approximate 
breakpoint locations shown (arrows). The results demarcate the location of 
the unequal crossover interval on the basis of the reciprocal copy number 
transition between the BP5 (red) and BP4 (blue) 45-kbp block segmental 
duplications. In this case, the breakpoints clearly map to a 22-kbp region 
outside the typical hotspot. Analysis of the sibling suggests that this region 
was the site of an interlocus gene conversion event from BP5 to BP4, and 
data from the mother imply that chromosomes having this event were 
present in the paternal germline. DNA from the father was not available 
for testing. 
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A trans-synaptic nanocolumn aligns neurotransmitter 


release tor eceptors 


Ai-Hui Tang!?*, Haiwen Chen!***, Tuo P. Li}?3, Sarah R. Metzbower!”, Harold D. MacGillavry* & Thomas A. Blanpied!? 


Synaptic transmission is maintained by a delicate, sub-synaptic 
molecular architecture, and even mild alterations in synapse 
structure drive functional changes during experience-dependent 
plasticity and pathological disorders!”. Key to this architecture is 
how the distribution of presynaptic vesicle fusion sites corresponds 
to the position of receptors in the postsynaptic density. However, 
while it has long been recognized that this spatial relationship 
modulates synaptic strength’, it has not been precisely described, 
owing in part to the limited resolution of light microscopy. Using 
localization microscopy, here we show that key proteins mediating 
vesicle priming and fusion are mutually co-enriched within 
nanometre-scale subregions of the presynaptic active zone. Through 
development of a new method to map vesicle fusion positions within 
single synapses in cultured rat hippocampal neurons, we find that 
action-potential-evoked fusion is guided by this protein gradient 
and occurs preferentially in confined areas with higher local density 
of Rab3-interacting molecule (RIM) within the active zones. These 
presynaptic RIM nanoclusters closely align with concentrated 
postsynaptic receptors and scaffolding proteins*®, suggesting 
the existence of a trans-synaptic molecular ‘nanocolumn. Thus, 
we propose that the nanoarchitecture of the active zone directs 
action-potential-evoked vesicle fusion to occur preferentially at 
sites directly opposing postsynaptic receptor-scaffold ensembles. 
Remarkably, NMDA receptor activation triggered distinct phases 
of plasticity in which postsynaptic reorganization was followed by 
trans-synaptic nanoscale realignment. This architecture suggests a 
simple organizational principle of central nervous system synapses 
to maintain and modulate synaptic efficiency. 

The location of vesicle fusion within an active zone is probably dic- 
tated by a few key members of the presynaptic proteome, including 
RIM1/2, Muncl13, and bassoon (Bsn)’ (Fig. 1a). To explore the organ- 
ization of these proteins, we studied their subsynaptic distribution 
relative to postsynaptic scaffolding protein PSD-95 in cultured hip- 
pocampal neurons using 3D-STORM§ following immunolabelling 
using primary antibodies and Alexa647- or Cy3-tagged secondary 
antibodies (Fig. 1b). Paired synaptic clusters of active zone protein and 
PSD-95 with clear borders were selected. As a confirmation that these 
pairs constituted synapses, we measured the peak-to-peak distances 
between pre- and postsynaptic clusters and found them to be consistent 
with previous measurements” (Extended Data Fig. 1). 

The distribution of RIM1/2 within the active zone, measured as 3D 
local density, was distinctively non-uniform with notable high-density 
peaks, which we characterized as nanoclusters (Fig. 1c, e). We adapted 
an auto-correlation function (ACF) to test whether this distribution 
occurs more frequently than expected by chance. The measured ACF 
showed significant non-uniformity compared to random ensembles 
(Fig. 1d). Simulations showed that the distance for which the ACF was 
significantly elevated provided a means to estimate the nanocluster 
diameter (Extended Data Fig. 2a—c). The average estimated diameter 


of ~80nm for RIM1/2 nanoclusters was very close to the reported 
size of PSD-95 and AMPA receptor (AMPAR) nanoclusters*®. Similar 
distribution and nanocluster properties were found using a different 
antibody targeted towards a separate epitope in RIM1 (Extended Data 
Fig. 2d). Isolated non-synaptic small groups of localizations showed a 
weaker ACF that was significant over a much smaller distance (Fig. 1d). 
This and other experiments suggest that the measured non-uniformity 
was not likely due to over-counting molecules or to potential artefacts 
of primary-secondary antibody labelling (Extended Data Fig. 3). 

To directly compare the nanoscale organization of key active zone 
proteins, we developed an algorithm that identified nanoclusters based 
on local densities (Fig. le). Nanoclusters of each protein were more 
likely to be located near the centre of synapses than near the edge (Fig. 1f, 
Extended Data Fig. 2i). Compared to PSD-95 as the common con- 
trol in pairwise two-colour experiments, there were similar numbers 
of RIM1/2, more Munc13, and fewer Bsn nanoclusters per synapse 
(Fig. 1h). Comparisons between these three proteins suggested that 
Munc13 had a wider distribution than RIM1/2 across the active zone 
and the distribution of Bsn was closer to uniform throughout the syn- 
apse (Fig. 1g-i, Extended Data Fig. 2f-n). Together, these observations 
revealed a complex and heterogeneous molecular architecture within 
single synapses, typified by dense assemblies of fusion-associated 
proteins nearer the centre. 

To examine the potential functional impact of the active zone nano- 
clusters on vesicle fusion'®!!, we sought to directly map the distribution 
of vesicle fusion sites over multiple release events within individual 
boutons. To do so, we adapted analysis for single-molecule localization 
to signals from single-vesicle fusion obtained with vGlutl-pHluorin- 
mCherry (vGpH). Neurons were cotransfected with cyan fluorescent 
protein (CFP)-tagged synapsinla (Syn1a), a vesicle-associated protein 
that marks boutons, and vGpH, which increases in green fluorescence 
intensity upon vesicle fusion’. Single electrical field stimuli evoked 
vesicle fusion (Fig. 2a, b, Extended Data Fig. 4a) with a release prob- 
ability (P,) of 0.11 +0.01 (mean +s.e.m.) per bouton, comparable to 
previous measurements, which was also sensitive to extracellular Ca?* 
(Extended Data Fig. 4b-d), as expected. In the presence of TTX, the 
frequency of action-potential-independent spontaneous release events 
detected with vGpH was similar to the frequency of NMDA receptor 
(NMDAR)-dependent postsynaptic Ca** transients measured sepa- 
rately using the Ca’* sensor GCaMP6f (Extended Data Fig. 5a). 

To determine whether these evoked fusion events represent single- or 
multi-vesicular fusion, we compared them with spontaneous release 
under TTX conditions (Fig. 2a-c), which most likely arises from single 
vesicle fusion’. By fitting the photon number distributions of evoked 
and spontaneous events, we estimated that ~72-82% of evoked events 
arose from single-vesicle fusion (Fig. 2c). With the majority of evoked 
release stemming from single-vesicle fusion, the location of fusion 
may be deduced by mathematically fitting the fluorescence profile cap- 
tured immediately after fusion (Fig. 2d), analogous to single-molecule 


1Department of Physiology, University of Maryland School of Medicine, Baltimore, Maryland 21201, USA. @Program in Neuroscience, University of Maryland School of Medicine, Baltimore, 
Maryland 21201, USA. 3Medical Scientist Training Program, University of Maryland School of Medicine, Baltimore, Maryland 21201, USA. 4Cell Biology, Department of Biology, Faculty of Science, 


Utrecht University, 3584 CH Utrecht, The Netherlands. 
*These authors contributed equally to this work. 


210 | NATURE | VOL 536 | 11 AUGUST 2016 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


RIM1/2 
PSD-95 


Whur 13 

PSD unc 

SI eet RIM1/2 
@ AMPAR 
ayer PSD95 


RIM1/2 Randomized 


Localizations Density map 


RIM1/2 
Isolated 
localizations 


\ Random 


0.5 . Bis ‘ 
ee 9 40 80 130 | (fea 
Local j>30 Radius (nm) 
densitylo 
f h i 
a 8 7 nee se 7 
ge 5 90), 100 
BO 3 cee 2 5 
eo 7 3: 2:0 & 50 wie 
x § c | 
os 4 6 15 = 25 
£6 Zz 5 
Zo 0 1.0 ts) 10} A A 
+ 0 50 100 150 200 CaN Se 10* 10° 10° 107 
d (nm) BWP NC volume (nm) 


Figure 1 | Vesicle release proteins form subsynaptic nanoclusters. 

a, Colour-coded schematic of studied synaptic proteins. AZ, active zone; 
PSD, postsynaptic density. b, Synapses labelled with RIM1/2 and PSD-95 
imaged using 3D-STORM (10-nm pixels) compared to wide-field 
composite (bottom corner, 100-nm pixels). Scale bar, 21m. Boxed synapse 
enlarged in original (top) and rotated (bottom) angles. Scale bar, 200 nm. 
c, En face (top) and side (bottom) views of a RIM1/2 cluster showing 

all localizations and local density maps for a measured synaptic cluster 
compared to a simulated randomized cluster. Scale bar, 200 nm. d, Auto- 
correlation functions of measured RIM1/2 (n= 115), isolated non-synaptic 
small groups of localizations due to repetitive switching of fluorophores 
(n= 42), and simulated randomized (n = 115) distributions. e, RIM1/2 
nanoclusters (red) within a synaptic cluster. f, Distribution of nanocluster 
distances from the centre of synapses normalized to randomized 
distribution. g, Molecule density inside nanoclusters (NC) normalized to 
synaptic average. h, Average number of protein nanoclusters per synapse. 
i, Cumulative distributions of nanocluster volumes. *P < 0.05; **P < 0.01; 
** P< 0.001, one-way ANOVA on ranks with pairwise comparison 
procedures (Dunn's method) for g, h and Kolmogorov—Smirnov test for 

i. All experiments were repeated >3 times. Also see Extended Data Fig. 3 
and Supplementary Table 1. 


localization techniques'*. For our median count of 518 photons per 
localization, the effective localization precision was in practice limited 
by vesicle diameter. In individual boutons, multiple evoked or spon- 
taneous single-vesicle fusion events were used to generate maps that 
defined the areas over which vesicle fusion occurred (Fig. 2e, Extended 
Data Fig. 4e-1). We called this approach ‘pHluorin uncovering sites of 
exocytosis’ or pHuse. 

Fusion site areas for spontaneous and evoked vesicle fusion tightly 
correlated with bouton areas measured by Syn la (Fig. 2f), as expected. 
However, the slopes of the correlations differed, even though the bou- 
ton sizes were similar between groups (Extended Data Fig. 5b). In fact, 
evoked fusion site areas were significantly smaller (median smaller 
by 48%) and occurred over a significantly smaller proportion of the 
bouton (median smaller by 39%) than spontaneous fusion (Fig. 2g, 
Extended Data Fig. 5c, d, h-j). 

One interpretation is that the concentration of vesicle priming 
proteins in nanoclusters favours evoked fusion in these subregions of 
the active zone. This predicts that pHuse events would be associated 
with higher local RIM1 density and conversely that high local den- 
sity of RIM1 increases the probability of nearby fusion. To assess these 
predictions, we mapped vesicle fusion sites relative to Eos3-tagged 
RIM1 using sequential PALM-pHuse imaging on the same live bou- 
tons (Fig. 2h, Extended Data Fig. 6d, e). As a local density metric for 
RIM1, we applied Voronoi tessellation and measured the first-rank 
density (6) for each RIM1-mEos3 localization (as described in ref. 15). 
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Figure 2 | Release site mapping by pHuse in single synapses shows 

RIM predicts evoked fusion distribution. a, Neurons co-expressing 
Synla-CFP (top; scale bars, 5 |1m), identifying synaptic boutons, and 
vGpH (bottom; scale bars, 500 nm), used to detect vesicle fusion with 
fluorescence increases from single action potential (AP)-evoked and 
spontaneous release. b, Example of fluorescence traces from evoked 

and spontaneous events over repeated trials at single boutons. c, Photon 
count distributions for detected spontaneous events fit with a normal 
distribution (j= 512, c= 167) and evoked events fit with a mixture of 
two normal distributions (1, = 542, 0; = 143, 2 = 912, 0, = 319). Filled 
circles with error bars show mean + s.d. of normal curves. d, Image 
processing steps in pHuse to determine fusion site locations. e, Fusion 
sites (green points) and area of fusion (blue line) from boutons of different 
sizes defined with Synla (white). Scale bar, 500 nm. f, Correlation 
between fusion area and bouton size, linear fit. Correlations are 
significantly different, ANCOVA, F;, 171 =5.01. g, Cumulative distributions 
of fusion areas normalized to bouton size (Kolmogorov—-Smirnov test, 
**D = 0.26). f, 8, Mspontaneous = 77/22, Nevoked = 104/28. h, Tessellated RIM1- 
mEos and pHuse localizations over the same boutons. Scale bars, 200 nm. 
i, Tesseler first-rank density (6!) for RIM1 measured versus randomized 
distributions as a function of distance from pHuse localizations. j, 
Comparison within boutons of average 6! for RIM] localizations within 
40 nm to a pHuse localization versus not. k, Average nearest pHuse 
distance as a function of RIM1 81. i, j, n= 26/13 *P < 0.05, **P<0.01, 
*** P< 0).001. n given in synapses/experiments. Also see Extended Data 
Figs 4-6. 


The distribution of RIM1-mEos3 was non-uniform and contained 
nanoclusters with an average diameter of 80.95 +5.34nm and 
78.93 + 5.85 nm using either an adapted SR-Tesseler analysis’* or 
nearest neighbour distance analysis’, respectively (Extended Data 
Fig. 6f), consistent with our 3D-STORM results (Fig. 1). We then com- 
pared 6! as a function of distance from the nearest pHuse localization 
for the measured RIM1 distributions versus randomized RIM1 distri- 
butions generated from the same number of localizations over the same 
area. Indeed, near pHuse sites, the average RIM1 6! was significantly 
greater than chance (Fig. 2i). Furthermore, within individual boutons, 
RIM1 molecules within 40 nm of a pHuse location had significantly 
higher &' than those further away (Fig. 2j). Conversely, considering 
all individual RIM1 localizations, the distance from the nearest 
pHuse localization decreased as a function of RIM1 6! (Fig. 2k). 
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Figure 3 | Trans-synaptic nanoscale alignment of active zone and PSD 
proteins. a, Distributions of synaptic RIM1/2 and PSD-95 pair as the 
original localizations (left) and with nanoclusters highlighted (right). Scale 
bar, 200 nm. Filled arrows indicate aligned nanoclusters, open arrows 
denote non-aligned nanoclusters. b, Paired correlation function (PCF) of 
measured RIM1/2 and PSD-95 compared to PCF with either distribution 
randomized. c, PCF of simulated distributions with (cyan) and without 
(orange) shuffling nanocluster positions. d, Cumulative distributions of 
cross-correlation index (mn = 143 synapses). e, RIM1/2 protein enrichment 
as a function of distance from translated PSD-95 nanocluster centres (top, 
filled points) and PSD-95 enrichment relative to RIM1/2 nanoclusters 
(bottom, open points). Simulations with same randomizations as in 
d, e were performed for each synapse. f, Protein density profile for enriched 
versus non-enriched nanoclusters, n = 119 PSD-95 nanoclusters, 90 RIM1/2 
nanoclusters. g, Enrichment indices for RIM1/2, Munc13, and Bsn relative 
to PSD-95 nanoclusters (filled) and for the opposite direction (open), 
n> 260 nanoclusters, *P < 0.05; **P<0.01, ANOVA on ranks with Dunn’s 
method. h, GluA2 enrichment with respect to RIM1/2 nanoclusters, n = 36 
synapses. Scale bar, 100 nm. All experiments were repeated >3 times. Also 
see Extended Data Fig. 6 and Supplementary Table 2. 


Thus, nanodistribution of RIM predicts the local probability of evoked 
fusion. 

For the synapse as a whole, the impact of presynaptic nanoscale 
organization and confined vesicle sites (Figs 1 and 2) will depend 
strongly on whether these RIM nanoclusters align with postsynaptic 
receptor nanoclusters’. To assess this, we compared the distribution 
of PSD-95 over the face of individual synapses to the corresponding 
distributions of RIM1/2, as the PSD-95 nanoclusters concentrate 
higher density of receptors’. An example synapse, presented in 
Fig. 3a (Supplementary Video 1), shows three RIM1/2 nanoclusters 
and three PSD-95 nanoclusters that appear well-aligned and one pair 
not aligned. We used two independent approaches to assess the rela- 
tionship between active zone and postsynaptic density (PSD) protein 
distributions. First we adapted a paired cross-correlation function 
(PCF) to measure the spatial relationship between the two distribu- 
tions (see Methods). The measured active zone—PSD distributions 
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showed a significantly elevated PCF compared to simulated active 
zone-PSD distributions with either distribution fully randomized 
(Fig. 3b). We then tested the contribution of nanocluster positions to this 
elevated PCF (Fig. 3c). Randomizing nanocluster positions and out-of- 
nanocluster molecules (keeping localizations within nanocluster bor- 
ders intact) abolished the PCF to chance level, while randomizing just 
the out-of-nanocluster molecules only modestly reduced the PCF, 
indicating that the precise positioning of the nanoclusters themselves 
dominate the overall correlation of protein distributions (Fig. 3c, d). 

Second, we reasoned that if synapses were trans-synaptically aligned 
on the nanoscale level, the protein distribution on one side of the syn- 
apse would predict protein density in the opposing neuron. To test 
this, we measured RIM1/2 localization densities as a function of radial 
distance from the centres of PSD-95 nanoclusters as translated across 
the synaptic cleft (Fig. 3e). RIM1/2 localization densities within a 60nm 
radius were significantly higher than the synaptic cluster average, 
decaying e-fold per 43.2 + 12.1 nm away from the peak. This enrich- 
ment was again principally dependent on the relative positioning of 
nanoclusters within synaptic clusters (Fig. 3e). For each individual 
nanocluster, we defined an enrichment index as the average molecular 
density of the opposed protein within a 60 nm radius from the nano- 
cluster centre (Extended Data Fig. 7a). Nanoclusters with enrichment 
indices significantly greater than that of the fully randomized distri- 
bution were considered enriched (Fig. 3f). We found 44.4 + 3.0% of 
PSD-95 nanoclusters to be enriched (Extended Data Fig. 7b), and these 
nanoclusters were opposed to RIM1/2 molecule densities that were 
2.0 + 0.1 times the average RIM1/2 synaptic cluster density (Fig. 3f). 
A similar PSD-95 protein enrichment profile was found relative to the 
centres of RIM1/2 nanoclusters (Fig. 3e). Thus, this detailed metric for 
assessing nanoscale alignment revealed strong co-enrichment of these 
key proteins along narrow, transcellular columns. In comparison to 
RIM1/2, the enrichment of Munc13 with respect to PSD-95 nano- 
clusters was considerably weaker, and Bsn intermediate (Fig. 3d, g, 
Extended Data Fig. 7c—e, Supplementary Table 2). Together, both the 
PCFs and protein enrichment analyses revealed significant trans-synaptic 
alignment between RIM1/2 and PSD-95 distributions, largely stem- 
ming from the correlated positions of their respective nanoclusters. 
We likewise found quantitatively similar number, characteristics, and 
alignment of pre- and postsynaptic nanoclusters in acute hippocampal 
slices from adult rats (Extended Data Fig. 7f-h). 

To determine whether evoked release aligns with postsynaptic recep- 
tors, we compared distributions of GluA2-containing AMPARs with 
RIM1/2 (Fig. 3h). Similar to PSD-95, GluA2 was significantly enriched 
relative to RIM1/2 nanoclusters, decaying e-fold per 66.9 + 15.4 nm. 
This was further confirmed with a different GluA2/3 antibody 
(Supplementary Table 2). Importantly, given that the probability of 
AMPAR activation declines with distance from glutamate release sites 
has previously been deduced*'*, we can predict synaptic potency by 
using the observed RIM1/2 and receptor distributions. To estimate the 
physiological impact of this trans-synaptic alignment, we calculated 
receptor activation in a measured synapse versus randomized distri- 
butions. Consistent with effect sizes posited by previous models*>!”, 
the measured distribution with trans-synaptic alignment gained 
21.8 + 0.5% in synaptic strength compared to a uniform distribution 
of active zone and PSD proteins (Extended Data Fig. 8), suggesting this 
synaptic architecture facilitates higher single-vesicle response potency. 
For comparison, long-term depression induces a very similar magni- 
tude decrease in synaptic strength'®. 

Notably, we found that trans-synaptic molecular alignment may 
extend deeper into the postsynaptic cell, as postsynaptic scaffold mol- 
ecules farther from the plasma membrane also colocalized with PSD-95 
nanoclusters (Extended Data Fig. 9a, c), and RIM1/2 was correspond- 
ingly enriched with respect to Shank nanoclusters (Extended Data 
Fig. 9b). 3D-STORM imaging of RIM1/2, PSD-95, and GKAP1 at 
the same synapses further confirmed their mutual co-enrichment 
(Extended Data Fig. 9d-f). Altogether, these results revealed an 
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Figure 4 | Retrograde plasticity of synaptic nanoscale alignment. 

a, Distributions of synaptic RIM1/2 and PSD-95 for control and post-LTP 
induction conditions with nanoclusters highlighted. b, c, Across-condition 
comparison of enrichment index and percentage of nanoclusters enriched 
(n= 45, 87 and 42 synapses for control, LTP and AP5, respectively). 

d, Distributions of RIM1/2 and PSD-95 for conditions following NMDA 
stimulation. Scale bar, 100 nm. e-i, Across-conditions comparison of 
RIM1/2 and PSD-95. Dark red in i represents RIM1/2 nanoclusters 
enriched with PSD-95 and light red the unenriched nanoclusters. n = 61, 
96, 77 and 74 synapses for control, NMDA, washout and AP5, respectively. 
j, Schematic summarizing the reorganization of nanoclusters during 
NMDA-induced plasticity and recovery. *P < 0.05; **P < 0.01, ANOVA 
on ranks with pairwise comparison (Dunn's method), and x? test for the 
proportion. All experiments were repeated >3 times. 


axially oriented molecular ensemble spanning the cleft within the 
bounds of the synapse, evoking the concept of a trans-synaptic nano- 
column enriched with key proteins that regulate synaptic transmission 
(Extended Data Fig. 9g). The graded protein densities involved suggest 
this may not be a clearly delineated structural element. Nevertheless, 
sensitivity of PSD-95 nanocluster size to latrunculin* further 
suggests that the spine cytoskeleton is engaged at the base of the 
column. Because actin executes many aspects of synaptic plasticity, 
this provides a potential means by which synaptic strength may be 
dynamically tuned. 

Consequently, we speculated that nanoscale alignment might 
be altered during synaptic plasticity. To test this, we induced long- 
term potentiation via glycine stimulation and withdrawal of the 
NMDAR antagonist D,L-2-amino-5-phosphonovaleric acid (AP5)), 
which resulted in an increase in PSD-95 localization density within 
nanoclusters, in the percentage of PSD-95 nanoclusters enriched 
with RIM1/2, and in the enrichment index of PSD-95 nanoclusters 
(Fig. 4a—c, Extended Data Fig. 10m). These changes were prevented by 
co-application of AP5 (Fig. 4a—c, Extended Data Fig. 10m). Notably, no 
changes in RIM1/2 were observed, consistent with LTP as a primarily 
postsynaptic phenomenon. 

We next tested an acute 5-min activation of NMDARs, known to 
induce a sustained depression of synaptic strength*®”!. Following this 
stimulus, postsynaptic nanostructure was markedly disrupted in the 
generally opposite manner, with the synaptic cluster volume of PSD-95 
and the number, volume, and protein density of PSD-95 nanoclusters 
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all reduced (Fig. 4d-f, Supplementary Table 3). These effects were 
long-lasting, and during the subsequent 25 min, most parameters 
underwent only partial recovery. In contrast, presynaptic nanostruc- 
ture underwent a remarkably different pattern of reorganization that 
was detectable only in relation to PSD-95 nanoclusters. Unlike PSD-95, 
RIM1/2 distributions were not affected immediately following the stim- 
ulus (Fig. 4d-f). However, following the 25-min recovery, the enrich- 
ment index of RIM1/2 with respect to PSD-95 nanoclusters increased 
with a corresponding increase in the percentage of enriched PSD-95 
nanoclusters (Fig. 4g, h). Remarkably, while RIM1/2 nanoclusters 
altogether remained constant in number and enriched percentage, 
there was in fact an increase in the size of those RIM1/2 nanoclusters 
that were enriched with PSD-95, whereas the other non-enriched 
RIM1/2 nanoclusters remained constant (Fig. 4i). Similar results were 
found when we studied NMDA-induced changes on RIM1/2 and 
GluA2/3 alignment (Extended Data Fig. 10a—h). Note that on a tra- 
ditional microscopic level, these changes to presynaptic organization 
were essentially undetectable: RIM1/2 staining revealed no change in 
synaptic cluster size or intensity at any point. Because the delayed 
presynaptic modification was specific to aligned nanoclusters, it may 
be that nanocolumns point to an alignment-specific, retrograde pre- 
synaptic compensation following postsynaptic depression (Fig. 4j), 
potentially relating to previous reports of presynaptic homeostatic 
plasticity”. 

Overall, the gradients of protein density we observed suggest a nano- 
column model, in which active zone regions with the highest likeli- 
hood of release are aligned to the densest receptor areas, optimizing 
the potency of neurotransmission (Supplementary Video 2). This pro- 
vides a simple organizational principle that may hold for many small, 
central nervous system synapses, and will have the largest influence 
at synapses that typically release only one vesicle following an action 
potential. The compartmentalized active zone architecture is reminis- 
cent of protein organization in Drosophila neuromuscular junction” 
and vertebrate ribbon synapses, where vesicles and priming proteins 
are arrayed around tight clusters of Ca”* channels. However, observa- 
tions in small central nervous system synapses of both clustered”*”° 
and random distributions of Ca** channels”*, and emerging evidence 
for channel mobility as an equalizer of P, for vesicles independent of 
channel positioning”’, suggest that their precise distribution may not 
be the sole determinant of the active zone release likelihood landscape. 

The alignment of pre and postsynaptic nanoscale subdomains*° 
suggests that even small synapses may be composed of dynamic func- 
tional modules”®”’. We hypothesize that the nanocolumn represents 
an especially sensitive point whereby disease-associated pathways, 
frequently known to alter synaptic plasticity’?, may disrupt synapse 
function. It will be important to identify which, if any, of the numerous 
cleft-spanning adhesion systems*’ or retrograde signalling mechanisms 
mediate release-receptor alignment and permit dynamic trans-synaptic 
realignment. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 


Received 24 July 2015; accepted 27 June 2016. 
Published online 27 July 2016. 


1. Fromer, M. et al. De novo mutations in schizophrenia implicate synaptic 
networks. Nature 506, 179-184 (2014). 

2. Volk, L, Chiu, S.-L, Sharma, K. & Huganir, R. L. Glutamate synapses in human 
cognitive disorders. Annu. Rev. Neurosci. 38, 127-149 (2015). 

3. Franks, K. M., Stevens, C. F. & Sejnowski, T. J. Independent sources of quantal 
variability at single glutamatergic synapses. J. Neurosci. 23, 3186-3195 
(2003). 

4. MacGillavry, H. D., Song, Y., Raghavachari, S. & Blanpied, T. A. Nanoscale 
scaffolding domains within the postsynaptic density concentrate synaptic 
AMPA receptors. Neuron 78, 615-622 (2013). 

5. Nair, D. et a/. Super-resolution imaging reveals that AMPA receptors inside 
synapses are dynamically organized in nanodomains regulated by PSD95. 
J. Neurosci. 33, 13204-13224 (2013). 


11 AUGUST 2016 | VOL 536 | NATURE | 213 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


18. 
19. 
20. 
21. 


22. 
23. 


Fukata, Y. et al. Local palmitoylation cycles define activity-regulated 
postsynaptic subdomains. J. Cell Biol. 202, 145-161 (2013). 

Stidhof, T. C. The presynaptic active zone. Neuron 75, 11-25 (2012). 

Huang, B., Wang, W., Bates, M. & Zhuang, X. Three-dimensional super- 
resolution imaging by stochastic optical reconstruction microscopy. Science 
319, 810-813 (2008). 

Dani, A., Huang, B., Bergan, J., Dulac, C. & Zhuang, X. Superresolution imaging 
of chemical synapses in the brain. Neuron 68, 843-856 (2010). 


. Park, H., Li, ¥. & Tsien, R. W. Influence of synaptic vesicle position on 


release probability and exocytotic fusion mode. Science 335, 1362-1366 
(2012). 


. Watanabe, S. et al. Ultrafast endocytosis at mouse hippocampal synapses. 


Nature 504, 242-247 (2013). 


. Balaji, J. & Ryan, T. A. Single-vesicle imaging reveals that synaptic vesicle 


exocytosis and endocytosis are coupled by a single stochastic mode. Proc. Nat! 
Acad. Sci, USA 104, 20576-20581 (2007). 


. Leitz, J. & Kavalali, E. T. Fast retrieval and autonomous regulation of single 


spontaneously recycling synaptic vesicles. eLife 3, e€03658 (2014). 


. Betzig, E. Single molecules, cells, and super-resolution optics (Nobel Lecture). 


Angew. Chem. Int. Ed. 54, 8034-8053 (2015). 


. Levet, F. et al. SR-Tesseler: a method to segment and quantify localization- 


based super-resolution microscopy data. Nat. Methods 12, 1065-1071 
(2015). 


. Raghavachari, S. & Lisman, J. E. Properties of quantal transmission at CA1 


synapses. J. Neurophysiol. 92, 2456-2467 (2004). 


. Tarusawa, E. et al. Input-specific intrasynaptic arrangements of ionotropic 


glutamate receptors and their impact on postsynaptic responses. J. Neurosci. 
29, 12896-12908 (2009). 

Dudek, S. M. & Bear, M. F. Homosynaptic long-term depression in area CA1 

of hippocampus and effects of N-methyl-d-aspartate receptor blockade. 

Proc. Natl Acad. Sci. USA 89, 4363-4367 (1992). 

Araki, Y., Zeng, M., Zhang, M. & Huganir, R. L. Rapid dispersion of SynGAP from 
synaptic spines triggers AMPA receptor insertion and spine enlargement 
during LTP. Neuron 85, 173-189 (2015). 

Lee, H.-K., Kameyama, K., Huganir, R. L. & Bear, M. F. NMDA induces long-term 
synaptic depression and dephosphorylation of the GluR1 subunit of AMPA 
receptors in hippocampus. Neuron 21, 1151-1162 (1998). 

Sanderson, J. L. et al. AKAP150-anchored calcineurin regulates synaptic 
plasticity by limiting synaptic incorporation of Ca2+-permeable AMPA 
receptors. J. Neurosci. 32, 15036-15052 (2012). 

Davis, G. W. & Muller, M. Homeostatic control of presynaptic neurotransmitter 
release. Annu. Rev. Physiol. 77, 251-270 (2015). 

Liu, K. S. et a/. RIM-binding protein, a central part of the active zone, is essential 
for neurotransmitter release. Science 334, 1565-1569 (2011). 


214 | NATURE | VOL 536 | 11 AUGUST 2016 
© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


24. Holderith, N. et al. Release probability of hippocampal glutamatergic terminals 
scales with the size of the active zone. Nat. Neurosci. 15, 988-997 (2012). 

25. Nakamura, Y. et al. Nanoscale distribution of presynaptic Ca** channels and its 
impact on vesicular release during development. Neuron 85, 145-158 (2015). 

26. Scimemi, A. & Diamond, J. S. The number and organization of Ca2* channels 
in the active zone shapes neurotransmitter release from Schaffer collateral 
synapses. J. Neurosci. 32, 18157-18176 (2012). 

27. Schneider, R. et al. Mobility of calcium channels in the presynaptic membrane. 
Neuron 86, 672-679 (2015). 

28. Tarr, T. B., Dittrich, M. & Meriney, S. D. Are unreliable release mechanisms 
conserved from NMJ to CNS? Trends Neurosci. 36, 14-22 (2013). 

29. Lisman, J. & Raghavachari, S. A unified model of the presynaptic and 
postsynaptic changes during LTP at CA1 synapses. Sci. STKE 2006, re11 
(2006). 

30. Missler, M., Stidhof, T. C. & Biederer, T. Synaptic cell adhesion. Cold Spring Harb. 
Perspect. Biol. 4,a005694 (2012). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements We thank S. Thompson, T. Abrams, S. Jurado and 

G. Wittenberg for advice and comments, P. Kaeser for advice on RIM expression 
and RIM antibodies, Y. Araki and R. Huganir for advice on chemLTP, and 

S. S. Divakaruni for advice and initial tests of chemLTP. We thank P. Kaeser for 
the gift of RIM1-mVenus, T. Ryan for vGlut1-pHluorin-mCherry, G. Augustine for 
Synla-CFP, and M. Contreras for technical assistance. This work was supported 
by F30-MH105111 to H.C., F30-MH102891 to T.P.L., F31-MH105105 to S.R.M., 
T32-GM008181 to H.C. and S.R.M., RO1-MHO80046 and NS090644 to T.A.B, 
and a gift from the Kahlert Foundation to T.A.B. 


Author Contributions A.T. and H.C. performed STORM experiments, A.T. 
designed 3D-STORM analysis, H.C. performed and analysed pHuse and RIM 
PALM experiments, T.P.L. and A.T. performed simulations, S.R.M. performed 
GCaMP imaging and nanobody STORM experiments, H.D.M. performed PSD 
PALM experiments, and A.T., H.C. and T.A.B. designed the experiments and wrote 
the manuscript. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial 
interests. Readers are welcome to comment on the online version of the 
paper. Correspondence and requests for materials should be addressed to 
T.A.B. (tblanpied@som.umaryland.edu) or A.T. (tangaihui@gmail.com). 


Reviewer Information Nature thanks S. Sigrist, X. Zhuang and the other 
anonymous reviewer(s) for their contribution to the peer review of this work. 


METHODS 


All experimental protocols were approved by the University of Maryland, 
Baltimore School of Medicine Institutional Animal Care and Use Committee. 
Dissociated hippocampal neurons from E18 SD rats of both sexes were prepared 
as described previously*’. To increase the experiment efficiency, for three-colour 
STORM experiments we used the ‘sandwich cultures with a supporting astroglial 
monolayer as described previously” in which most neuronal structures were in 
the same focal plane. All experiments were performed on neurons 14-21 DIV and 
repeated on 3 or more separate cultures unless otherwise specified. 
Immunostaining. Cells were fixed with 4% paraformaldehyde (PFA) and 4% 
sucrose in PBS (pH 7.4) for 10 min at room temperature (RT), followed by wash- 
ing with 50mM glycine in PBS. Cells were then permeabilized and blocked using 
3% BSA or 5-10% donkey or goat serum in PBS with 0.1% Triton X-100, followed 
by incubation with primary antibody (3h RT or 4°C overnight) and secondary 
antibodies (1h RT). 

For comparisons of Muncl13 or RIM1/2 with Bsn made using 3D-STORM, 
mouse anti-Bsn (1:500, Enzo) was used with either rabbit anti-RIM1/2 (1:500; 
Synaptic Systems No. 140203) or rabbit anti-Munc13 (1:500; Synaptic Systems No. 
126103). Cy3 or Alexa-647 conjugated goat or donkey anti-rabbit or anti-mouse 
secondary antibodies (1:200 in PBS; JacksonImmuno) were used*’. For compar- 
isons of Munc13 and RIM1/2, staining was performed sequentially separated by 
additional blocking steps of incubation with rabbit serum at RT for 30 min followed 
by incubation with excess unconjugated anti-rabbit Fab antibody for 1h at RT. 
For this set of experiments, all permutations of the order in which the primary 
antibody was applied and the fluorophore used to label each protein were included. 
For trans-synaptic measurements, rabbit anti-Munc13, anti-RIM1/2, anti-RIM1 
(1:500; Synaptic Systems No.140003) or anti-Bsn (1:500, Cell Signaling), were used 
with mouse anti-PSD-95 (1:200; Neuromab), mouse anti-GluA2 (1:100, Millipore), 
or rabbit anti-GluR2/3 (1:100, Millipore). Unless specified otherwise, presynaptic 
proteins were labelled with donkey anti-rabbit IgG conjugated with Alexa-647 
and postsynaptic PSD-95 were labelled with donkey anti-mouse IgG conjugated 
with Cy3. For comparison of directly labelled primary antibody with primary- 
secondary antibody labelling, we directly conjugated Alexa-647 dye to anti-PSD-95 
antibody and purified antibody using illustra NAP Columns (GE Healthcare). 
For comparison of nanobody labelling of expressed GFP-tagged knockdown- 
rescue PSD-95 with primary-secondary antibody labelling, we used GFP-booster 
(1:200, Chromotek). More information on antibodies used can be found in the 
Supplementary Information. 

Tissue slice staining was performed essentially as previously described?**. 
Briefly, 1-mm thick blocks of hippocampal tissue from 5-7-week-old male SD 
rats were fixed with ice-cold 4% PFA for 15 min and then dehydrated with 30% 
sucrose in PBS. Cryostat sections with 401m thickness were made, permeabilized 
and blocked with 10% donkey serum and 0.3% Triton X-100 in PBS/glycine for 
1h. PSD-95 and RIM1/2 were labelled with the same antibody concentration as 
was used in cell culture. 
3D-STORM imaging. Imaging was performed on an Olympus IX81 ZDC2 
inverted microscope with a 100 x/1.49 TIRF oil-immersion objective. Excitation 
light was reflected to the sample via a 405/488/561/638 quad-band polychroic 
(Chroma). The typical incident power was ~30 mW for 647 nm and ~60 mW for 
561 nm (measured through the objective). To reduce background fluorescence 
while maximizing the depth of view, we adjusted the incident angle of the excitation 
beam to near but less than the critical angle, to achieve oblique illumination of the 
sample. Emission was passed through a Photometrics DV2 which split the emission 
at 565nm and directed the red and far-red bands through matched filters (595/50 
and 655 long-pass) onto an ixon+ 897 EM-CCD camera (Andor). A cylindrical 
lens (focal length = 30 cm) was inserted in each path of the splitting cassette of the 
DV2 to create the astigmatism for 3D imaging. All hardware was controlled via 
iQ software (Andor). Z stability was maintained by the Olympus ZDC2 feedback 
positioning system. Lateral drift was corrected with a cross-correlation drift- 
correction approach***, 

Labelled cells and tissue slices were imaged in a STORM imaging buffer freshly 
made before experiments containing 50 mM Tris, 10mM NaCl, 10% glucose, 
0.5mg/ml glucose oxidase (Sigma), 40|1g/ml catalase (Sigma), and 0.1 M cysteamine 
(Sigma). For tissue slices, the focal plane was set to within 1.5,1m from the glass 
coverslip to obtain the best signal-to-noise ratio. Imaging was performed as 
previously described*“*. TetraSpeck beads (100 nm; Invitrogen) deposited on a 
coverslip were localized to correct alignment between the two channels as 
described previously’. The average deviation of the bead localizations after cor- 
rection was between 10 and 15nm. To calibrate the 3D positions of localizations, a 
z-stack with 30-nm steps was collected on the same coverslip with beads. The aver- 
age deviation of localized z-positions of immobilized fluorophores was 40-50 nm. 
Three-colour 3D-STORM. Three-colour STORM were performed with two 
sequential sets of two-colour 3D-STORM on RIM1/2-PSD-95 as a pair and then 
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GKAP1-PSD-95 as a pair. Cells were immunolabelled with mouse anti-PSD-95, 
rabbit anti-RIM1/2, and mouse anti-GKAP1 (1:200, Neuromab). PSD-95 and 
RIM1/2 were then immunolabelled with secondary antibodies conjugated to 
Alexa647 and Cy3, respectively. After >20 min of continuous excitation by 
high-powered lasers during the first round of imaging, the majority of Cy3 mole- 
cules (RIM1/2) became bleached. After acquisition of the first set of data, GKAP1 
was then labelled with secondary antibody conjugated to Cy3 while the coverslip 
remained on the microscope. The two sets of data were aligned post hoc based on 
Alexa647 (PSD-95) localizations. Because RIM1/2 and GKAP1 are not overlapping 
proteins, in the second imaging set, those Cy3 localizations within the RIM cluster 
borders potentially arising from the small, unbleached fraction of RIM-Cy3 were 
rejected from GKAP1 localizations. 

PALM-STORM Imaging. PALM imaging of PSD-95 concurrent with STORM 
imaging of GKAP or Shank (1:200, Neuromab) was performed as previously 
described’. 

Single-molecule localization and analysis. All data analysis was performed offline 
using custom routines in MATLAB (Mathworks). Molecule locations were deter- 
mined by fitting an elliptical 2D Gaussian function to an 11 x 11 pixel array (pixel 
size 100nm) surrounding the peak‘. The lateral (x, y) and axial (z) coordinates 
of the fluorophore were determined from the centroid position and ellipiticity 
of the fitted peak, respectively*. Only molecules localized with an x-y precision 
<10nm (ref. 37), fitting R?>0.6, and comprising >200 photons were used for 
further analysis. 

To remove the localizations from those fittings of multiple overlapping peaks, we 
developed a rejection criteria based on the shape of peaks. For peaks arising from 
single fluorophores, the fitted width in x and y (W, and W,, respectively) should 
correlate in a manner mainly determined by the cylindrical lens. All localizations 
away from this correlation would come from multiple overlapping or poorly fitted 
peaks and were therefore rejected (Extended Data Fig. laf). 

Single-molecule tracking was employed to remove the overcounted localizations 

from peaks lasting for more than one frame. Tracking was accomplished with 
available algorithms (http://physics.georgetown.edu/matlab/). Particles appearing 
in consecutive frames separated by no more than 200 nm were collapsed into one 
track and considered one molecule by taking only the location in the first frame 
for further analysis. 
Analysis of synaptic clusters. A potential synapse could be identified by a juxta- 
posed pair of synaptic proteins in a 2D scatter plot of all accepted localizations from 
both channels. By rotating a 3D scatter plot of localizations of a selected potential 
synapse, we evaluated the data quality and selected only those with clear pre- and 
postsynaptic components (for example, no nearby third cluster which may indi- 
cate two synapses in close proximity) for further analysis. To define the border of 
a synaptic cluster, the nearest neighbour distances (NND) between localizations 
were calculated and the mean + 2 s.d. of NND was used as a cut-off to divide the 
localizations into sub-clusters. All localizations outside of the primary sub-clusters 
were considered to be background and discarded. 

Owing to the irregularly curved shapes of some synapses, using the convex hull 
to define synaptic cluster shape would overestimate the synaptic cluster volume. 
We thus defined the synaptic cluster using the alpha shape of the set of 3D local- 
izations with a= 150 nm. This value was determined based on series of tests on 
>100 synapses to obtain the best synaptic cluster shape while avoiding dramatic 
changes in volume when individual localizations near the border were added or 
removed. This alpha shape algorithm gave a synaptic cluster volume of 81 + 3% 
of the convex hull volume (n= 156 synapses). Subsequently, this alpha shape was 
used as the cluster border when localizations were randomized. 

A synaptic cluster was only considered for analysis if the volume was between 
2 x 10-3 um? and 30 x 10~3\1m3 (ref. 38), and contained an average density of 
>8 x 10° localizations/jim3. Local density was defined as the number of molecules 
within a radius of 2.5 times the standard median nearest neighbour distance 
(MdNND) for the synaptic cluster density. The standard MdNND was calculated 


(unit per 100 nm voxel for 


from a standard correlation curve MdNND = 3 ae 


d) where d is the averaged localization density. This equation is derived from fitting 
MdNND with d in a series of simulations of uniformly distributed synaptic clusters 
with different densities. The reason we used this standard MdNND instead of the 
median NND from the original synaptic cluster was to reduce the deviation caused 
by local assembles. 
Nanocluster analysis. Localizations with local densities >14 were selected and 
divided into agglomerative sub-clusters with a node height cut-off of 40 nm using 
MATLAB functions linkage() and cluster(). For each sub-cluster, we then calcu- 
lated the NND and discarded those localizations with NND > MdMND if any. 
Only those sub-clusters containing >4 localizations were counted as nanoclusters. 
These criteria were chosen based on a conservative strategy such that no nano- 
clusters were identified in simulations of randomly distributed synaptic clusters 
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with different densities. Consequently, they may have prevented detection of small 
or weakly enriched nanoclusters. In principal, we cannot completely exclude the 
possibility of overcounting, so a certain fraction of detected nanoclusters are poten- 
tially artificial. However, we used the same standard on all data sets. Since all 
the trans-synaptic analyses were well controlled by randomizing simulations, this 
contamination is not able to produce false positives for trans-synaptic alignment 
analyses. On the contrary, it may attenuate the significance of the differences in 
trans-synaptic analyses based on nanoclusters, including cross-correlation, protein 
enrichment and the fraction of enriched nanoclusters. 

Since the number of localizations in one nanocluster was typically small, using 

convex hull or alpha shape would greatly under-estimate the nanocluster volume 
due to the border effect. Therefore, we tessellated the synaptic cluster with polyhe- 
drons using MATLAB function voronoin(), with each Voroni cell containing one 
localization. The nanocluster volume was calculated as a summation of volumes 
of all polyhedrons containing the nanocluster localizations. To avoid unexpected 
unbounded Voronoi cells and over-estimating the volume of cells near the cluster 
surface, we introduced ~10% background noise by adding randomly distributed 
localizations around the cluster!°. Polyhedron volume for each localization was 
averaged across ten independent simulations. 
ACF analysis. To quantify the self-clustering of synaptic proteins, we adapted an 
autocorrelation function*®” for our 3D data. The autocorrelation function g,(r) is 
a measure of density correlations, which reports increased probability of finding 
a second localized signal a distance r away from a given localized signal. It was 
tabulated in Matlab using Fast Fourier Transforms (FFTs), as in equation (1). 


FFT~1(|FFT(I)/?) 


Sul?) = oper) IEPT(W)P) 


(1) 


FET! is an inverse Fast Fourier Transform, I is the reconstructed 3D density 
matrix of localized fluorophores (pixel size of 5 nm), p is the general localization 
density inside the synaptic cluster, and W is a shape function that has the value of 1 
inside the synaptic cluster as defined above with an alpha shape and the value of 0 
elsewhere. The matrix I was padded with zeros in all three directions out to a 
distance larger than the range of the desired correlation function (we used 200 nm) 
to avoid artefacts due to the periodic nature of FFT functions. W was also padded 
by an equal number of zeros. FFT~\(|FFT(W)|?) is a normalization factor account- 
ing for the general shape of the synaptic cluster itself so that the output of the g (7) 
represented only the internal structure of the measured synaptic cluster. g,(7) was 
symmetric to rotations around the centre of matrix C (x, y. Z,), and it could be 
averaged over angles to obtain g (r) by converting to polar coordinates. g (1) was 
then binned by radius (r). Correlation functions were plotted for r>0, as g(r = 0) 
was a trivial contribution. 

For a uniform distribution, for example, when all localizations were uniformly 
randomized within the alpha shape, g(r) =1 (Fig. 1d). Any heterogeneity will 
result ina g (r)>1. The extent of g (r) over 1, that is, ro for g (1) =L is related 
to the pattern size of the internal heterogeneity (Extended Data Fig. 2b, c)*’. 

Isolated, non-synaptic small groups of localizations were taken from our 
experimental data. These localization groups likely represent an overestimate of 
a single-dye-molecule localization spread. Nevertheless, we find that they are still 
significantly smaller than the large majority of the nanoclusters we detected. 
Imaging vesicle exocytosis. For imaging vesicle fusion, vGluT-pHluorin- 
mCherry (a gift from T. Ryan)***!, was cotransfected with Synla—CFP (a gift from 
G. Augustine) using Lipofectamine 2000 (Invitrogen) for 4-6 days before imaging 
cells at 14-20 DIV. Optical measurements were performed using a laminar-flow 
perfusion and stimulation chamber. Images were acquired at 10 Hz with an Andor 
iXon 887 EM-CCD camera on an Olympus [X81 ZDC2 inverted microscope with 
a 100x/1.49 TIRF oil-immersion objective. Temperature was controlled using an 
objective heater set at either room temperature (~25°C) or 32°C. Action poten- 
tials were evoked by passing 1 ms current pulses yielding fields of 10 V/cm via 
platinum-iridium electrodes. Terminals were selected for imaging by assessing 
their responsiveness, as indicated by a fluorescence increase, to a 10 AP train 
at 20 Hz. A wide-field Synla image was then taken at the imaging plane. Single 
AP-evoked release was measured over 60 trials of (1) 1 s acquisition of baseline 
fluorescence, (2) stimulus, (3) 2.5s acquisition of post-stimulus fluorescence, 
(4) 7 s recovery during which the laser is off. Spontaneous release was meas- 
ured over 5 min of continuous acquisition. Cells were imaged in a saline solution 
containing 120 mM NaCl, 3mM KCI, 2mM CaCh, 2mM MgCl, 10 mM glucose, 
and 10mM HEPES, pH adjusted to 7.4 with NaOH, 104M 6,7-dinitroquinox- 
aline-2,3-dione (DNQX; Sigma), 501M p,L-2-amino-5-phosphonovaleric acid 
(AP5; Sigma), and 500 nM Jasplikinolide (Jasp; Millipore) at room temperature 
(~25°C). When higher [CaCl] was used, [MgCl] was reduced to keep the diva- 
lent ion concentration constant. For measurements of spontaneous events, 500 nM 


tetrodotoxin (TTX; Enzo) was added after identifying terminals using AP-evoked 
fluorescence increase. 

For calculating normalized changes in fluorescence (AF/F), images were ana- 
lysed in Image] by custom-written plugins!”. Average fluorescence intensities were 
measured over a circular region of interest (ROI) of radius 800 nm for each bouton. 
Change in fluorescence (AF) was calculated as the difference in intensity of the 
frame after the stimulus was delivered and the average ROI intensity of 5 baseline 
frames not including the first frame or the frame immediately before the stimulus 
(Foaseline)- AF/F was calculated by normalizing each AF to Foaseline- 
pHuse localization and analysis. Data analysis was performed offline using cus- 
tom routines in MATLAB (Mathworks). Boundaries for individual boutons were 
determined using wide-field images of Synla—CFP centred at the focal plane of 
the pHuse experiments thresholded at 50% of the peak intensity (33% and 67% 
thresholds were also compared and showed no significant difference on the effect 
of mode of release, shown in Extended Data Fig. 5i). Binary images were created 
from the thresholded image, and Synla—CFP puncta area calculated as a meas- 
ure of bouton area, which correlated with pHuse area, as expected”. Images for 
each fusion event were processed using frame-by-frame subtraction followed by 
background subtraction to isolate fluorescence increases (Fig. 2d)**. Similar detec- 
tion thresholds were set for spontaneous (75 + 15) and evoked (78 + 14, t=0.88, 
P=0.40) release, at ~3-4 times above background noise, on an individual imag- 
ing field basis. Spatial localization of the fusion events was determined by fitting 
an elliptical 2-dimensional Gaussian function to a 9 x 9 pixel array surrounding 
the peak. Only molecules localized with a precision <25nm*”, elliptical form 
<1.3, and comprising >100 photons were used for further analysis. An additional 
criterion to exclude evoked pHuse localizations with photon counts > mean + 2 s.d. 
of spontaneous photon count distribution was used in Extended Data Fig. 5d and 
showed no significant difference compared to the distribution lacking this crite- 
rion. Localizations from multiple fusion events over time at individual boutons 
were mapped. A 2D convex hull algorithm was used to calculate the minimal 
convex polygon that incorporated all fusion site localization points. The area of 
the resulting polygon was used as the fusion site (pHuse) area. 

Photon count distributions analysis. Data analysis was performed offline using 
custom routines in MATLAB (Mathworks). The distribution for spontaneous 
fusion events was fit with a normal distribution using normfit(), which uses max- 
imal likelihood estimation for optimization. The distribution of evoked fusion 
events was fit with a custom univariate distribution for a mixture of two normal 
distributions with a probability density function (pdf) defined in equation (2). This 
fitting also used maximal likelihood estimation for optimization of five parameters, 


including the mixture probability (p), and the population means (jz, j4,) and var- 


iance (0, 02) for each component, over 300 iterations using normpdf() to compute 
the pdf for each of the two component normal distributions. 


pdf =p x normpdf(x, 1, 01) + (1 — p) x normpdf(x, j1,, 02) (2) 


Here p was constrained between 0 and 1, and o had a lower bound of 0. This mix- 
ture probability defined the lower estimate (72%) for the percentage of single stim- 
ulus evoked fusion arising from single vesicles. We calculated the higher estimate 
(82%) by calculating the percentage of evoked fusion events with photon counts 
within two standard deviations of the mean spontaneous fusion event photon 
count. To assess the influence of multivesicular events on evoked pHuse area, we 
used this as a cut-off to exclude localizations above this photon count. We found 
no significant difference between evoked area with and without excluding these 
events (Extended Data Fig. 5d). 

Ca?+ imaging and analysis. For Ca" imaging, the genetically encoded indicator 
GCaMP6f (ref. 45) was transfected at 14 DIV and imaged 3 days after transfection. 
GCaMPéf was used to detect postsynaptic miniature spontaneous Ca”* transients 
(mSCaTs) that arose in dendritic spines following NMDA receptor activation by 
spontaneous release“°. Coverslips were placed in custom-made chambers in saline 
solution containing 11M TTX, 10}4M DNQX, 25\.M picrotoxin (Sigma), and 5 1M 
nifedipine (Sigma). Imaging was performed on a spinning disk confocal system 
(Andor Technology), consisting of a CSU-22 confocal (Yokagawa) with a Zyla 4.2 
CCD camera detector (Andor) mounted on the side port of an Olympus IX-81 
inverted microscope, using a 60 x/1.42 oil-immersion objective, yielding a final 
effective pixel size of 108 nm. Continuous acquisition at 20 Hz was collected for 
3 min, controlled by iQ software (Andor). 

Data analysis was performed offline using custom routines in Metamorph 
(Molecular Devices), Clampex (Molecular Devices), and Matlab (Mathworks). 
First, using Metamorph, a baseline image was created by averaging the first three 
and last three image frames and a maximum intensity projection was made by 
averaging all image frames. Image subtraction of the baseline from the maximum 
intensity projection revealed spines that showed an increase in GCaMP inten- 
sity. Regions of interest (ROIs) were drawn around these ‘active’ spines as well 
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as a background region and then transferred to the original timelapse. For each 
ROI the averaged intensity was measured per frame. The average intensity of the 
background ROI was subtracted from the average intensity of ‘active’ spine ROIs. 
From this, an average fluorescence intensity was calculated for every 10 frames, and 
within every minute interval of imaging the lowest positive value was used as the 
baseline fluorescence intensity for that minute (Fhaseline,1 min). A normalized change 
in fluorescence (AF/F) was calculated for each frame as (Fframe — Foaseline,1 min)/ 
Fyaseline,1 min. The AF/F values were then fed into Clampex, and mSCaTs were 
detected using a template search that identified peaks based on a shape profile 
determined from mSCaT examples with near-average rise and decay time courses. 
Confocal imaging of presynaptic proteins. Neurons 14-20 DIV were cotrans- 
fected for 3 days with RIM1-mVenus (a gift from P. Kaeser) and Synla—CFP to 
assess colocalization. Neurons transfected with only RIM1-mVenus were immu- 
nostained with chicken anti-GFP (1:200, Chemicon) labelled with secondary 
anti-chicken-Alexa-488, rabbit anti-RIM1/2 labelled with secondary anti-rabbit- 
Cy3, and mouse anti-Bsn labelled with secondary anti-mouse-Alexa-647 to assess 
expression levels. Imaging was performed on a spinning disk confocal system as 
described above. ImageJ was used to analyse fluorescence intensity of RIM1/2 and 
Bsn at transfected compared to neighbouring untransfected boutons. 
PALM-pHuse. RIM1-mEos3.1 was constructed by subcloning mEos3.1 from 
mEos3.1-N]1 (a gift from S. McKinney) into pCMV5-RIM1-mVenus (P. Kaeser) 
in place of mVenus at NotI-AscI. PALM was performed on RIM1, and nanoclusters 
identified using local density measured by nearest neighbour distance as previously 
described’, or using an adapted form of SR-Tesseler first rank neighbour density 
(81), using 2x mean 6! of the whole synapse as the threshold for identifying nano- 
clusters, as described in ref. 15. Nanoclusters identified by both methods were 
similar in size (Extended Data Fig. 6). To map vesicle fusion to active zone nanos- 
tructure, RIM1-mEos3.1 was cotransfected with vGpH at 10-14 DIV and imaged 
at 14-18 DIV. RIM1 PALM and pHuse of 1-AP-evoked release was performed as 
described above sequentially on the same boutons. Overlapping RIM1 and pHuse 
localizations were analysed at boutons containing >10 RIM1 localizations and 
>3 pHuse localizations offline using custom routines in MATLAB (Mathworks). 
vGpH fluorescence increase following a 10 AP-train stimulus was used to outline 
the border of individual boutons. Randomized distributions of RIM1 were sim- 
ulated for each synapse by randomly placing the same number of RIM1 localiza- 
tions within the same area of RIM1 as calculated by convex hull of the measured 
RIM1 distribution. RIM] local density within these randomized distributions was 
similarly calculated. Normalized RIM1 &! was calculated with respect to overall 
synaptic localization density. 

3D paired cross-correlation function (PCF) analysis. The 3D PCF was adapted 
from a similar function previously used to quantify colocalization in 2D data”. 
It was computed using two matrices (I; and J) reconstructed from two image 
channels (equation (3). 


FFT-1(FFT(I) x conj[FFT(h)]) 
P\P.FFT~\(FFT(W,) x conj[FFT(W2)]) 


g(7) = Re (3) 


Here, conj[] is a complex conjugate, p; and p2 are the averaged localization densi- 
ties in the pair of synaptic clusters, W; and W> are shape functions of the two 
synaptic clusters, and Re{} indicates the real part. Different from the ACE, the 
symmetric origin of g (7) here is no longer the matrix centre C (x,, y., Z,), but a 


different point A (x, y, z), and the vector CA represents the direction and distance 
for the translation of PSD-95 synaptic clusters (Iz) to get the best overlap with 
presynaptic clusters (I,). We computed the direct correlation between J, and I, 
with equation (4). 


G= FFT~\(FFT(I) x conj[FFT(I5)]) (4) 


A is the point with the peak G value. Because the originally constructed matrices 
I, and Ih were not continuous, to reduce the noise of the correlation, we first con- 
voluted the two matrixes with an 11 x 11 x 11 kernel (Extended Data Fig. 1g). To 
avoid having the correlation be dominated by local domains with high localization 
density, we cut the peaks of the convoluted matrixes to 1/4 of the mean localization 
density within synaptic clusters (p;/4 and p2/4) (Extended Data 
Fig. 1h) so that G only represented the relationship between the general 3D shapes 
of the two synaptic clusters (I/, I,) without internal heterogeneity (Extended Data 


Fig. 1m, n). Around A, g_(7’) is symmetric and could be angularly averaged to 
get g(r). 

Since the information of synaptic cluster shape and overall density had been 
normalized, g(r) was fully dependent on the internal organizations of the 
two synaptic clusters. If localization assemblies inside the two synaptic clusters 
organized in a similar pattern and opposed each other, g (r)> 1. If either synaptic 
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cluster had a uniform distribution of localizations (Fig. 3b) or the internal assem- 
blies were not aligned (Fig. 3c), g.(r)=1. Different from the ACF, overcounting 
has no effect on the PCF®. 

Protein enrichment analysis. The protein enrichment profile of protein A relative 
to a protein-B nanocluster, E4_,,(1), was calculated as the angularly averaged local- 
ization density of protein A around the aligned centre of a protein-B nanocluster 
normalized to the average localization density in synaptic cluster A. The aligned 
nanocluster centre was found as shown in Extended Data Fig. 1. To avoid potential 
problems caused by boundary conditions, we calculated the enrichment profile as 
equation (5). 


Na—a(7) 


E4_.3(r) = ———_———_ 
aon) Na,(m)—B(1) X m 


(5) 


Na-a(r)is the binned distribution of protein-A localization number to the aligned 
protein-B nanocluster centre, N4,(m)— (7) is the distribution of localization num- 
ber for a uniformly randomized synaptic cluster A with m times of original local- 
ization density, and m is a factor set to 15 to reduce the effect of fluctuations. 
A protein-B nanocluster was considered to be significantly enriched with protein 
A if E,4_.2(r) > mean{E4,.2(r)] + 1.96 x standarddeviation[E 4,-.(r)], where 


E,,-+2(r) represents the enrichment profile of ten simulated uniformly randomized 


A synaptic clusters with the original density and the same alignment to the nano- 
cluster centre of protein-B. 

Chemical LTP and LTD. Chemical LTP was performed using a combination 
of AP5 withdrawal and application of glycine as described in ref. 19. Briefly, 
3-4-week-old cultures were treated with 200\1M pi-AP5 in culture medium 
for two days and then transferred to ACSF (150 NaCl, 3 KCl, 2CaCh, 1 MgCh, 
10 HEPES-Na, 10 p-glucose, all in mM, pH 7.4) with 100M picrotoxin, 11M 
strychnine, 0.5 1M TTX and 200|1M AP5. After preincubation for 1-2h, chemical 
LTP was induced with 15 min incubation in the similar solution with 200 1M glycine 
but without Mg?* and APS. Neurons were fixed directly following induction. 
Chemical LTD was performed using application of NMDA as described in ref. 20. 
Control solutions of regular saline solution or co-application with AP5 were paired 
with experimental conditions. Cells were fixed either immediately after plasticity 
induction or washed with saline and incubated for 25 min at 37°C to allow recovery 
before fixing. Cells were then immunostained and imaged as described above. 
Synaptic modelling. We used an experimentally constrained deterministic 
approach to study the dependence of synaptic strength on the spatial distribution 
of release sites and AMPARs. Central to this approach is the relationship between 
channel opening probability and its distance from a release site, determined pre- 
viously by stochastic modelling approaches*!>*”: 


P(r) = 0.42 e7/88 (6) 


where r is the lateral distance between an AMPAR and a release site (in nm). In 
brief, the distribution of RIM1/2 proteins and GluA2/3-containing AMPA recep- 
tors measured by STORM were used to determine the spatial coordinates of release 
sites and AMPARs on a model synapse. Since the precise photophysics and blink 
distribution of dyes are complicated and the exact efficiency of antibody labelling is 
unknown, we calculated gradient maps of spatial coordinates to determine putative 
RIM1/2 protein and AMPAR locations from the single-molecule images. First, the 
3D spatial coordinates were projected onto 2D planes orthogonal to the manu- 
ally determined axodendritic axis. Each projected point was assigned a Gaussian 
function, the amplitude and width of which were determined by the normalized 
local density and the lateral STORM localization precision (20nm). Overlapping 
Gaussian functions within the active zone or PSD convex hull were integrated to 
create the pre- and postsynaptic gradient maps. The sampling pixel size was 2.5nm 
(the calculated synaptic response was independent of pixilation level for sampling 
size from 1 to 20 nm, data not shown). The pre- and postsynaptic gradient maps 
were separated by 20nm, the cleft distance used to determine equation (6)°. 

The model synaptic response for a single synapse was computed as the expected 
fraction of receptors that would open given a single release, averaged over all pos- 
sible release locations in the active zone. For any single release event, the expected 
open fraction of channels at the peak of the response was calculated as follows: 


; LD; 
oi => Rl) <TD, (7) 


where rj is the lateral distance between the ith pixel in the presynaptic gradient 
map and the jth pixel in the postsynaptic gradient map; the expected fraction of 
open channels O(i) from the ith release site is sum of channel opening probabilities 
at all pixels in the postsynaptic gradient map, where each jth pixel is weighted by 
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its normalized local density LD; (that is, the channel fraction is assumed to be 
directly proportional to the channel local density). To constrain the location of 
release events in the active zone, we used the live-cell pHuse-PALM data, which 
showed that release events preferentially occurred in regions with normalized RIM 
local density greater than 1.5, and these events occurred over 20-60% of the active 
zone area (spontaneous pHuse area/PALMed RIM area, and evoked pHuse area/ 
spontaneous pHuse area). To account for these measured features, we modelled 
the spatial likelihood of release as a piecewise sigmoidal function dependent on 
the normalized local RIM density: 


a = or : 
0.5 1D, > if LD; € [0, LDingect] 

aan LDinflect 

Pi | release) = (8) 
(1 _ s) LD; — LDinflect 
LDmax — LDinflect : 
0.5+0.5 - De oa if LD; € [LDingects LDmax] 
— 5 —LPi= Winflect_ 


LDmax — LDinflect 


where s is the steepness of the sigmoid transition, LD; is the normalized local 
density of RIM at the ith pixel of the presynaptic gradient map, LDinflect is the point 
of inflection in the sigmoidal function, and LDyax is the maximum normalized 
local density of RIM in the STORM measured example shown in Extended Data 
Fig. 8b. LDinect and s were fitted to be 1.5 and 0.959 in order to yield a fractional 
release area of 40%. To calculate the average peak synaptic response per release, 
we calculated the expected open channel fraction averaged over all possible release 
sites weighted by the spatial probabilities of release: 


P,(i | release) 


Open channels at peak response (%) = Oi 
. ss . me u ( Spi | release) 


Code availability. All code used in the paper is available upon request. 

Statistical analysis. No statistical methods were used to predetermine sample 
size. The experiments were not randomized, and investigators were not blinded 
to allocation during experiments and outcome assessment. Statistical tests were 
performed with Sigmastat, MATLAB, Graphpad, or R. No statistical methods were 
used to predetermine sample size. The sample sizes were determined based on 
numbers reported in previous studies. For comparison of two or more distribu- 
tions, all samples were assessed for normality using Shapiro-Wilk or Kolmogorov- 
Smirnov tests. If samples met criteria for normality, we used a Student's t-test to 
compare two groups, a paired t-test for comparison of the same group before and 
after a treatment, or ANOVA for more than two groups. If ANOVAs were sig- 
nificant, we used a post hoc Tukey test to compare between groups. For groups 
with combinations of discrete and continuous variables, we used MANCOVAs. 
We only performed two-tailed tests. Homogeneity of variances was tested using 
an F-test and found to be similar between compared groups. If samples did not 
meet criteria for parametric tests, we used Kolmogorov-Smirnov or Wilcoxon 
rank-sum tests for comparison of two groups and Kruskal-Wallis or Friedman 


ANOVA for comparison of more than two groups, with post hoc analysis using 
Dunn's test. Data are presented as mean +s.e.m. unless otherwise specified. Also 
see Supplementary Tables. 
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Extended Data Figure 1 | Filtering of localizations and automatic 
algorithm to detect the synaptic axis. a, Scatter plot of fitted peak width 
in y (Wy) against that in x (W,). The colour codes the position in z. All 
localizations away from this centre dense region arise from multiple 
overlapping or poorly fitted peaks and should be rejected. b, The ellipticity 
(W,/Wy,) and the width difference (W, — Wy) formed an approximate 
linear relationship when W, > W, (dotted box). c, We fitted the ratios 
between ellipticity and the width difference to the denominators with third 
degree polynomial functions (black line) and rejected all localizations out 
of 95% confidence intervals (grey lines) of the curve (>1.96 x s.d.). The 
same criteria was applied to the other fraction of localizations with 

W, < W,. d, The same scatter plot as in a after rejection of all of the diffuse 
localizations (about 20-25%). e, f, The filtering protocol cleared up most 
of the localizations from multiple overlapping peaks or poorly fitted peaks, 
including most of the non-relevant background localizations (e) and those 
localizations with poorly calibrated z positions (f). Scale bars, 2 1m (e) and 
200 nm (f). The synapse in f corresponds to the boxed synapse in e. 

g, A 2D section through the centre of the convoluted constructed 3D 
distribution matrix of a synapse. h, Peak density of the matrix set to a 
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quarter of the mean molecule density of the synaptic cluster. i, 2D section 
at the same position of the 3D matrix of direct cross-correlation of the two 
channels (equation (3) in Methods). C is the centre of matrix, and A is the 
peak of the cross-correlation. j-k, Best overlap of the two synaptic clusters 
after PSD-95 was moved in 3D space along the vector CA. 1, 3D scatter 
plots of the synapse in two different view angles. The arrow denotes the 
vector and the extended line (dotted) represents the synaptic axis. m, 3D 
plot of detected synaptic axis when the positions of high-density peaks in 
RIM1/2 (nanoclusters) were randomized within the synaptic cluster. This 
simulation was performed 35 times, but only 10 representative results are 
presented here to avoid overlapping. The red denotes the synaptic axis of 
the original synaptic cluster. n, Averaged distance between the detected 
C,, positions from 35 simulated clusters to the C position of the original 
cluster. Data shown in mean + s.d. This <6 nm distance confirms that the 
high-density peaks have negligible effect on the detection of the synaptic 
axis in this Method. 0, Distribution of all localizations along the synaptic 
axis with bin size of 10nm. Peak-to-peak distance between the synaptic 
protein pair can be measured from this distribution. p-r, Distribution of 
peak-to-peak distances for three pairs of synaptic proteins. 
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Extended Data Figure 2 | See next page for caption. 
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Extended Data Figure 2 | Nanocluster organization of vesicle release 
machinery proteins in the active zone and postsynaptic AMPA 
receptors. a, En face (top) and side (bottom) views of local density maps 
of a simulated synapse with artificial nanoclusters with 40-nm diameters. 
Scale bar, 100 nm. b, Autocorrelation function of simulated clusters 

with different sized nanoclusters. The points represent the radius where 
g(r) = 1. ¢, Pooled data from 15 sets of simulations showing that the radius 
where g(r) first crosses 1 reasonably estimates the average nanocluster 
diameters. d, Comparison of nanocluster number, fraction of localization 
in nanocluster, and nanocluster volume across different developmental 
stages shows no significant difference, though the young 9 days in vitro 
(DIV) culture shows a trend towards increased nanocluster numbers 
(one-way ANOVA on ranks for nanocluster number and volume, one-way 
ANOVA for percentage localization in nanocluster). Data were from 

143 RIM nanoclusters and 135 PSD nanoclusters of 64 DIV 9 synapses, 

63 RIM nanoclusters and 65 PSD nanoclusters of 38 DIV 14 synapses, and 
44 RIM nanoclusters and 41 PSD nanoclusters from 28 DIV 21 synapses. 
e, Comparison of two RIM antibodies (from left to right) in whole 
synaptic cluster volume, number of nanoclusters, autocorrelation function 
estimating average nanocluster diameter, and protein density relative to 
PSD-95 nanocluster centres. Anti-RIM1/2 (Synaptic Systems #140-203) 
targets the zinc-finger domain and anti-RIM1 targets the PDZ domain 

of RIM1 (Synaptic Systems #140-003). These tests suggest that there is 

no significant difference between these two antibodies. The numbers in 
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bars denote the group sizes. f, Local density maps of en face (top) and side 
(bottom) views of an example Munc13 cluster. Scale bar, 200 nm. g, Auto- 
correlation functions for Munc13 distributions compared to simulated 
randomized distributions. h, i, Local density maps and ACF of Bsn cluster. 
Scale bar, 200 nm. j, Pooled cluster volumes, normalized to PSD-95 
volumes within each synapse. Each bar pair represents data from a set of 
RIM1/2-PSD-95, Munc13-PSD-95 or Bsn-PSD-95 staining. The numbers 
in bars denote the group sizes. k, Distribution of en face distances between 
nanocluster centre and synapse centre. Data were normalized to the 
distribution of simulated clusters with the same number of nanoclusters as 
the original synapse but randomized positions. 1, An example synapse with 
RIM1/2 and Muncl3 staining of the same synapse, shown in two different 
angles. The translucent surfaces represent the alpha shapes that define the 
synaptic cluster borders. m, Pooled RIM1/2 and Munc13 cluster volumes, 
normalized to RIM1/2 within each synapse. n, Pooled RIM1/2, Munc13 
and Bsn cluster volumes from staining of RIM1/2-Bsn and Munc13-Bsn, 
normalized to Bsn within each synapse. *P < 0.05; ***P < 0.001; Wilcoxon 
signed-rank test. +P < 0.05, one-way ANOVA on ranks with pairwise 
comparison procedures (Dunn’s method). 0, Local density map of a GluA2 
cluster. p, Auto-correlation functions for GluA2 distributions compared 

to simulated randomized distributions. q, Local density map of a GluR2/3 
cluster. r, Auto-correlation functions for GluR2/3 distributions compared 
to simulated randomized distributions. All experiments were repeated 

>3 times. 
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Extended Data Figure 3 | See next page for caption. 
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Extended Data Figure 3 | Detected nanoclusters are unlikely a result 

of labelling artefacts or overcounting of molecules. a—i, Comparison of 
PSD-95 labelled with monoclonal primary antibodies directly conjugated 
to Alexa647 dye (1°-A647, red) with the same molecules labelled with 
primary and secondary antibodies conjugated to Cy3 (1°-2°-Cy3, blue) 

as represented in c. a, b, Comparison between non-synaptic small groups 
of localizations arising from isolated primary antibodies and secondary 
antibodies. Schematic shown in a. Standard deviation of localizations in 
both groups along different dimensions (n = 32 for A647; n = 36 for Cy3) 
in b. The two types of localizations groups showed similar variation in all 
dimensions. d, Local density maps of the same PSD-95 cluster labelled 
with 1°-A647 (top) and 1°-2°-Cy3 (middle) and overlapped distribution 
of 1°-A647 and 2°-Cy3 with detected nanoclusters highlighted in darker 
colours (bottom). Scale bar, 200 nm. e, Autocorrelation of synaptic clusters 
labelled with 1°-A647 and 1°-2°-Cy3. f, Autocorrelation of isolated small 
groups of localizations of A647 and Cy3 dyes. g, Comparison of the radius 
at which the autocorrelation function crossed with the random level 

(g(r) = 1). There was no difference between PSD-95 clusters with different 
labelling methods, but the r(0) for isolated localization groups were 
significantly less than r(0) for PSD-95 clusters. **P < 0.01, t-test between 
the filled and open bars of the same colour. h, Nanoclusters detected in 
both channels displayed no difference in number, volume, or the fraction 
of nanoclusters enriched with localizations from the other channel. 

i, Protein enrichment of localizations detected in each channels with those 
in the other channel (” = 32 synapses). These results demonstrate that 

the nanoclusters we detected in our study were not due to aggregation 

of multiple secondary antibodies to the primary antibodies. j-r, Cells 
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transfected with knockdown-rescue-PSD-95-GFP were labelled with 
nanobodies against GFP conjugated at a 1:1 ratio with Atto647 (Nb-At647, 
red) and primary/secondary antibodies against PSD-95 (1°-2°-Cy3, blue) 
as depicted in 1. j, k, Comparison between non-synaptic small groups of 
localizations arising from isolated Nb-At647 and 1°-2°-Cy3 (as depicted 
inj, n= 26 and 28, respectively). k, The nanobodies showed a significant 
smaller size than antibodies. ***P < 0.001, two-way ANOVA, tP < 0.05, 
++P< 0.01, pairwise comparison (Tukey test) between nanobodies and 
antibodies. m-r, Similar comparison as in d-i between PSD-95 clusters 
labelled with Nb-At647 and 1°-2°-Cy3 (n= 13 synapses). Scale bar, 

200 nm. Overall, these results demonstrated that the nanoclusters we 
detected in our study were unlikely a result of artefacts of antibody binding 
and labelling. The difference between the size of the isolated localizations 
groups and PSD-95 clusters calculated by autocorrelation also argues 
against the possibility that the nanoclusters we detected were owing 

to repetitive switching of one or a few fluorophores. **P < 0.01, t-test 
between the filled and open bars of the same colour. s, An example synapse 
with nanoclusters highlighted before (upper) and after (lower) removal 

of localizations resulting from fluorophores lasting for multiple frames. 
Scale bar, 100 nm. t, Paired autocorrelation function of synaptic clusters 
with and without multiple-frame molecules. P=0.77, n= 25 synapses 

for RIM1/2; P=0.58, n= 25 synapses for PSD-95, two-way ANOVA with 
repeated measures. u, The tracking removed 13 + 8% and 17 + 9% of the 
localizations for RIM1/2 and PSD-95, respectively, but had no significant 
effects on autocorrelation function results, nanocluster numbers, or 
nanocluster volumes. **P < 0.01; ***P < 0.001; NS, P> 0.05; Wilcoxon 
signed-rank test. All data were pooled from >3 replicas. 
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Extended Data Figure 4 | 1AP evoked release is [Ca**] dependent and 
mainly univesicular*®. a, Example of fluorescence signals at a single 
bouton over repeated trials of 1 action potential stimulation. b, Single 
event traces of vGpH fluorescence increase following 1 action potential 
stimuli in standard (2 mM) or heightened extracellular [Ca?*] (4mM). 

c, Comparison of distributions of fluorescence changes in 2mM 
(n=233/27) and 4 mM (n= 115/12) extracellular [Ca?*], relative to noise 
distributions obtained from the baseline frames before stimulation. 


Mos foi foz tos toatas| 
mt ele 
Saas 
REESE 
aaa 
Ea Ree 
“EEREOoE 
De 
BBM \ 

PRM at 0.25 05 % 025 O05 


b 93. 2mMCa2+ 03, 
0.2 4 0.2 + 
Ww Ww 
Zot 4 20.1 } 
0 0 
-0.1 1 r , -0.1 : 1 ! 
0 0.3 06 09 0 0.3 0.6 0.9 
Time (s) Time (s) 
Cc d 
0.15 0.04 
2mM Ca?* “a 2mM Ca?* 
c es 
7 0.1 Noise 2 / 
° 8 0.02 
2 0.05 © 
a a 


dF/F 
i j 


Processed vGpH, TTX detection, TTX 


on Synta, TTX 


2 


5yum 


k 


d, Comparison of noise-subtracted distributions of fluorescence changes 
in different [Ca?*]. e, Processed images of vGpH fluorescence increase 
following 1 action potential stimuli from three trials ten trials apart. 

f, Automatic detection using pHuse of events shown in e. g, Summed 
projection of framewise and background subtracted vGpH fluorescence 
increases over 60 trials. h, pHuse localizations on Synla (white). i-I, Same 
as e-h for spontaneous events in TTX over 5 min. n given in synapses/ 
experiments. 
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Extended Data Figure 5 | pHuse reveals differences between evoked and 
spontaneous fusion site areas. a, Comparison of spontaneous frequency 
measured presynaptically using vGpH (n = 77/22) and postsynaptically 
using GCaMP6f (ref. 45) (n = 61/5), t= 1.02, not significant. b, Average 
bouton areas across groups, t = 0.87, not significant. c, Cumulative 
distributions of fusion areas for spontaneous and evoked release 
(Kolmogorov-Smirnov test, *D = 0.23) d, Cumulative distributions 

of normalized fusion areas for 1 AP evoked fusion excluding events 

with photon counts > mean + 2 s.d. of spontaneous events (n = 91/27) 
compared to all evoked events (n = 104/28, Kolmogorov-Smirnov 

test, D=0.05, not significant) and spontaneous events (n = 77/22, 
Kolmogorov-Smirnov test, *D = 0.25) e, f, Notably, while evoked P, 

was significantly positively correlated with Syn1a area, as reported 
previously*®, spontaneous event frequency showed no relationship with 
Syn1a area (e, linear fit, evoked **R = 0.30, spontaneous R = 0.12, not 
significant). On the other hand, both spontaneous event frequency and 
evoked P, significantly positively correlated with pHuse area (f, linear fit, 


Normalized pHuse Area 


6 
# of Localizations *** 


evoked ***R = 0.64, spontaneous ***R= 0.60). This suggests that pHuse 
area may be a better approximation for active zone area and the functional 
parameters of a synapse than bouton area. g, Normalized pHuse area as a 
function of cell age shows no significant correlation (evoked R= 0.03, not 
significant, spontaneous R= 0.004, not significant). e-g, Nevoked = 104/28, 
Nspont = 77/22. h. Normalized pHuse area was not significantly different 

at room temperature (Meyoked = 51/10, Mspont = 32/7) versus physiological 
temperature (Meyoked = 35/9, Mspont = 34/4) within modes of release but still 
significantly different between modes of release. i, Normalized pHuse 
area was not significantly different at different thresholds for Synla within 
modes of release but still significantly different between modes of release 
(n=51/10). j, Both numbers of events and mode of release are significant 
factors for pHuse area, but they do not have a significant interaction 
Nevoked = 155/38, Nspont = 109/29. For i, j, see Supplementary Tables 

for statistics. n given in synapses/experiments, *P < 0.05; **P < 0.01; 
#EEP < (0.001. 
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Extended Data Figure 6 | RIM1-mEos3.1 PALM identifies nanoclusters. measure of local density. f-h, Cumulative distributions of PALMed 


a, Neurons co-expressing RIM1-mVenus (a gift from P. Kaesar) RIMI1 nanoclusters diameter, area, and number, respectfully, identified 
and Synla—CFP colocalize to the same boutons. Right panels show using adapted Tesseler analysis and 5 x NND analysis (n = 65/13). 
enlargement of areas within the white boxes. Scale bars, 5 1m (left) and i, RIM1 localization density as a function of radial distance from pHuse 
1jum (right). b. Neurons expressing RIM1-mVenus immunostained for localizations. (See Supplementary Tables for statistics.) j, Mean distance 


RIM1/2 and Bsn. Arrowheads point to some colocalized active zones. Scale from pHuse localizations as a function of local density measured by 
bar, 2m. c, Immunofluorescence intensity of transfected cells normalized 5 x NND (raw data ***R = 0.23, n = 26/13). k, Proportion of pHuse 

to nearby untransfected cells show 3.74 + 0.11-fold overexpression of RIM _ localizations within 40 nm of a RIM1 localization as a function of RIM1 
and 1.24 + 0.03-fold increase in Bsn (n = 262 synapses/7 cells). d, Photon local density measured by 5 x NND (***R=0.35). n given in synapses/ 
count distribution of RIM1-mEos3.1 (3997 localizations). e, Same boutons experiments unless otherwise specified, ***P < 0.001. 

shown in Fig. 2 visualized using 5 x nearest neighbour density (NND) as a 
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e, Pooled enrichment index of three active zone proteins and PSD-95. 
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The simulated release site distribution at a synapse was drawn from its 
measured RIM positions and the average measured relationship between 
RIM density and pHuse locations (Fig. 2). b, Distributions of measured 
RIM localizations within a single active zone (active zone) boundary 
(grey), and the same cluster with randomized positions of the indicated 
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subsets of molecules. c, Maps of RIM local density normalized to the 
overall densities within the active zones. d, Probability density maps 

of possible release sites given that a release occurs. e, Distributions 

of GluA2/3 locations within the PSD boundary (grey) of the same 
measured synapse (ellipses refer to this distribution) and randomized. 

f, Maps of fraction of open channels at peak response per average release 
from the respective active zones directly above them in d. g, Calculated 
open channels at peak response, n = 20 randomly generated molecular 
distributions. See Methods for more details. 
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Extended Data Figure 9 | Enrichment of other scaffolding proteins 
within nanocolumns. a, Enrichment of Homer1 with PSD-95 
nanoclusters, 7 = 118 nanoclusters from 48 synapses, scale 100nm. 

b, Enrichment of RIM1/2 to Shank nanoclusters, n = 80 nanoclusters 
from 32 synapses. Scale bar, 200 nm. *P < 0.05, ANOVA on ranks 

with pairwise comparison procedures (Dunn’s method) in a and b. 

c, GKAP2 and Shank3 densities (determined with STORM, n=6 and 

12, respectively) within PSD-95 nanoclusters (determined with PALM of 
transfected knockdown-replacement-PSD-95-mEos2) normalized to total 
PSD densities. Both proteins showed significant enrichment in PSD-95 
nanoclusters, *P < 0.05, paired t-tests. d, Three-colour STORM imaging 
of RIM1/2, GKAP1 and PSD-95 on the same synapses example (left) and 
protein enrichment profiles of RIM1/2 and GKAP1 with respect to PSD-95 
nanoclusters (right), n = 32 nanoclusters from 17 synapses. Scale bar, 
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Extended Data Figure 10 | Plasticity within nanocolumns. a, Changes 
in the localization density within RIM1/2 (red) and PSD-95 (blue) 
nanoclusters under control, 5 min NMDA treatment, 25 min washout, 
and NMDA + AP35 treatment conditions. b-h, Reorganization of RIM1/2 
and GluR2/3 under control, 5 min NMDA treatment, 25 min washout 
conditions examples (b), comparison of whole synaptic cluster sizes 

(c), nanocluster number per synapse (d), localization density within 
nanoclusters (e), enrichment indices (f), percentage of nanoclusters that 
were enriched (g), and nanocluster volumes (h). Note that similar to the 
results from the RIM1/2-PSD-95 analyses, only those RIM1/2 nanoclusters 
that were enriched with GluR2/3 (dark red) were increased in volume. 

*P < 0.05; **P< 0.01, ANOVA on ranks with pairwise comparison to 


control group (Dunn’s method), and \? test for the proportion. Data 
from 62, 21 and 37 nanoclusters from 34, 18 and 24 synapses for control, 
NMDA, and washout, respectively. i, Colour-coded local density map of an 
example live-PALMed PSD-95 cluster before and after NMDA treatment. 
Scale bar, 100 nm. j, k, Changes in PSD-95 nanocluster area induced by 
NMDA and blocked by AP5 (n = 28 and 21, respectively). **P < 0.01, 
NS, not significant, paired t-test. 1-n, LTP stimulation induced changes 
in nanocluster volumes (1), localization density within nanoclusters (m) 
and nanocluster numbers (n). *P < 0.05, ANOVA on ranks with pairwise 
comparison to control group (Dunn's method). All experiments were 
repeated >5 times. 
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Tumour-cell-induced endothelial cell necroptosis 
via death receptor 6 promotes metastasis 


Boris Strilic!, Lida Yang', Julian Albarr4n-Judrez!, Laurens Wachsmuth’, Kang Han’, Ulrike C. Miiller’, Manolis Pasparakis* 


& Stefan Offermanns!* 


Metastasis is the leading cause of cancer-related death in humans. It 
is a complex multistep process during which individual tumour cells 
spread primarily through the circulatory system to colonize distant 
organs!->, Once in the circulation, tumour cells remain vulnerable, 
and their metastatic potential largely depends ona rapid and efficient 
way to escape from the blood stream by passing the endothelial 
barrier*°. Evidence has been provided that tumour cell extravasation 
resembles leukocyte transendothelial migration’”~°. However, it 
remains unclear how tumour cells interact with endothelial cells 
during extravasation and how these processes are regulated on 
a molecular level. Here we show that human and murine tumour 
cells induce programmed necrosis (necroptosis) of endothelial 
cells, which promotes tumour cell extravasation and metastasis. 
Treatment of mice with the receptor-interacting serine/threonine- 
protein kinase 1 (RIPK1)-inhibitor necrostatin-1 or endothelial-cell- 
specific deletion of RIPK3 reduced tumour-cell-induced endothelial 
necroptosis, tumour cell extravasation and metastasis. In contrast, 
pharmacological caspase inhibition or endothelial-cell-specific loss 
of caspase-8 promoted these processes. We furthermore show in vitro 
and in vivo that tumour-cell-induced endothelial necroptosis leading 
to extravasation and metastasis requires amyloid precursor protein 
expressed by tumour cells and its receptor, death receptor 6 (DR6), 
on endothelial cells as the primary mediators of these effects. Our 
data identify a new mechanism underlying tumour cell extravasation 
and metastasis, and suggest endothelial DR6-mediated necroptotic 
signalling pathways as targets for anti-metastatic therapies. 

Apoptosis and necroptosis are major forms of regulated cell death'®. 
Necroptosis involves in part molecules also found to regulate apoptosis 
but depends on the formation of the necrosome, consisting of RIPK1 
and RIPK3, which activates the pseudokinase mixed lineage kinase 
domain-like (MLKL) by phosphorylation, leading to the execution 
of the necroptotic program!!"!3. Critical for the formation of the 
necrosome is a compromised caspase-8 activity, which functions as a 
mediator of apoptosis'*’°. 

While studying the interaction of tumour cells (TCs) and endothelial 
cells (ECs), we found an increased number of dead ECs when 
co-cultured with TCs (Fig. la). Dying ECs lacked typical apoptotic 
features such as nuclear condensation and/or fragmentation’® as well as 
annexin-V-positivity that were found in apoptotic ECs after treatment 
with TRAIL, staurosporine or tumour-necrosis factor-a (TNF-a) 
(Fig. la and Extended Data Fig. lac). Instead, similar to nuclei of cells 
that underwent H202- or hypoxia-induced accidental necrosis, nuclei 
of ECs exposed to TCs showed minor changes in morphology but were 
positive for ethidium-homodimer-III (EthD-III), which, like propidium 
iodide, indicates necrotic cells with compromised membrane integrity'® 
(Fig. la, Extended Data Fig. 1a, d and Supplementary Videos 1-4). 
TC-induced necrotic EC death was not affected by addition of 


peripheral blood mononuclear cells and platelets (Extended Data 
Fig. le) and was seen during co-culture of different primary human and 
murine ECs with a variety of TC lines in a concentration-dependent 
manner (Fig. 1b and Extended Data Fig. 1f-h). 

EthD-III-positive ECs also co-stained for phospho-MLKL (Extended 
Data Fig. 1i), and knockdown of RIPK3 or MLKL in ECs as well as 
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Figure 1 | TC-induced endothelial necroptosis promotes TC 
transendothelial migration. a, Fluorescent images of human umbilical 
vein ECs (HUVECs) cultured in the presence of TCs (green) and stained 
as indicated. Arrowheads, EthD-III-positive ECs; scale bar, 10,1m. b, EC 
necrosis upon exposure to different TCs. c-f, EC necrosis in the presence 
of MDA-MB-231 TCs (c, d) and MDA-MB-231 TC transmigration over 
an endothelial layer (e, f) after siRNA-mediated knockdown in ECs as 
indicated (c, e) or after treatment with Nec-1 (301M) or z-VAD-fmk 
(zVAD, 1001M) (d, f). Shown are representative data of two (b) or three 
(c-f) independent experiments with mean values + s.e.m. (b-d) or + s.d. 
(e, f) from biological triplicates (n = 3) (b) or sextuplicates (n =6) (c-f). 
*P < 0.05; **P < 0.01; ***P < 0.001. One-way analysis of variance 
(ANOVA) and Bonferroni's post hoc test. 
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treatment of cells with the RIPK1 kinase inhibitor necrostatin-1 
(Nec-1) blocked TC-induced endothelial necrotic cell death (Fig. 1c, d 
and Extended Data Fig. 1j, k). In contrast, knockdown of caspase-8, 
a major negative regulator of necroptosis'*"*, or inhibition of caspases 
by treatment of cells with z- VAD-fmk (zVAD) increased the number 
of necrotic ECs (Fig. 1c, d and Extended Data Fig. 1j). These data 
identify the mode of TC-induced EC death as necroptosis. Interestingly, 
inhibition of TC-induced necroptosis reduced TC migration over an EC 
layer whereas enhanced necroptosis promoted transmigration (Fig. le, f). 

Within a few hours after intravenous (i.v.) injection of syngeneic 
B16F10 (B16) melanoma or LLC1 (Lewis lung carcinoma line 1) 
cells into C57BL/6 wild-type (WT) mice, the lungs of these animals 
showed EthD-III-positive cells (Fig. 2a and Extended Data Fig. 2a), 
which co-stained for the EC markers ERG and CD31 (Fig. 2b and 
Extended Data Fig. 2b) but did not show any overlap with fluorescently 
labelled TCs or CD45-positive leukocytes (Extended Data Fig. 2c). 
EthD-II-positive ECs showed no signs of chromatin condensation 
or fragmentation (Fig. 2a) and were negative for cleaved caspase-3, 
TdT-mediated dUTP nick end labelling (TUNEL) or annexin-V 
(Fig. 2a and Extended Data Fig. 2d). TC-induced EC necrosis was not 
a result of physical blood vessel occlusion, as it was not reproduced by 
iv. injection of equal amounts of 15 jum microspheres (Extended Data 
Fig. 2e). In mice with tamoxifen-induced endothelium-specific RIPK3- 
deficiency (Tie2-CreER!;RIPK3!?*? (RIPK32*°); Extended 
Data Fig. 3a, b), which had normal pulmonary vascular permeability 
(Extended Data Fig. 3c), numbers of EthD-III-positive ECs induced 
after TC injection were reduced (Fig. 2c, d). Moreover, the number 
of transmigrated TCs over ECs derived from these knockout animals 
was reduced in vitro (Extended Data Fig. 3d), and loss of endothelial 
RIPK3 expression strongly reduced numbers of extravasated TCs 
6h after iv. TC injection as well as numbers of metastases 12 d later 
(Fig. 2e-g and Extended Data Fig. 3e) or of metastases derived 
from primary tumours (Extended Data Fig. 3f). Similar effects 
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animals treated with dimethylsulfoxide (DMSO), 
Nec-1 or z-VAD-fmk (zVAD) (i-k). Six hours 
later, pulmonary EthD-III-positive ECs (d, i) 
and extravascular TCs (e, j) were quantified, 

and 12 d later lung metastases were determined 
g, k). c, f, Confocal images of lung sections 

scale bar, 50,1m) and images of lungs. 

h, Experimental design as performed in i-k. 
Cre-negative littermates served as controls 

c-g). Shown are representative data of 

three independent experiments with mean 
values + s.e.m. (d, e, i,j) or + s.d. (g, k) from 
n=A4 (d,e, i, j) or n=6 animals (g, k) per 
condition. *P < 0.05; **P< 0.01; ***P < 0.001. 
One-way ANOVA and Bonferroni's post hoc test. 
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were observed in MLKL-deficient mice (Extended Data Fig. 4) or 
after short-term treatment of WT mice with Nec-1, which did not 
affect TC proliferation, viability or migration in vitro (Fig. 2h-k and 
Extended Data Fig. 5a-e). Also Nec-1s reduced metastasis of murine 
TCs (Extended Data Fig. 5f), and Nec-1 treatment resulted in fewer 
necroptotic ECs and metastases when human MDA-MB-231 TCs 
were injected into immunodeficient SCID mice (Extended Data 
Fig. 5g, h). In contrast, animals with induced EC-specific caspase-8- 
deficiency (Tie2-CreER';Casps! floxe (Casp8®CX°), Extended Data 
Fig. 6a), which had normal pulmonary vascular permeability (Extended 
Data Fig. 6b), and WT mice treated with zVAD showed increased 
numbers of EthD-III-positive ECs and extravasated TCs, and animals 
developed more metastases including those from primary tumours 
(Fig. 2c-k and Extended Data Fig. 6c-i). Thus, TCs induce necroptotic 
EC death also in vivo and EC necroptosis facilitates extravasation of 
TCs and promotes metastasis formation. 

To identify potential endothelial receptors that mediate TC-induced 
necroptosis, we performed short interfering RNA (siRNA)-mediated 
knockdowns of 32 candidate genes (Supplementary Table 1). 
Knockdown of DR6 (also known as tumour-necrosis factor receptor 
superfamily 21 (TNFRSF21)) in ECs reduced TC-induced necroptosis 
as well as transendothelial TC migration (Fig. 3a—c and Extended 
Data Fig. 7a). The TNF receptor family member DR6 is expressed 
by mouse and human ECs of different vascular beds (Extended Data 
Fig. 7b-d), and contains a cytosolic death domain that enables cell 
death signalling!”!®. Mice lacking DR6 showed reduced metastasis 
(Fig. 3d), and DR6 expressed by immune cells'”"’? is not involved 
in metastasis (Extended Data Fig. 7e-j). Blocking DR6-function by 
an anti-DR6 antibody (5D10)”° protected ECs from TC-induced 
necroptosis and strongly reduced the number of TCs migrating over 
an endothelial layer (Extended Data Fig. 8a, b). The TNF-a blocker 
etanercept had no effect (Extended Data Fig. 8c—e). Animals treated 
within 6h after TC injection with the anti-DR6 antibody showed reduced 
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Figure 3 | DR6 mediates TC-induced endothelial 
necroptosis and metastasis formation. a, Effect 

of endothelial siRNA-mediated knockdowns on 
MDA-MB-231 TC-induced necroptosis. b, c, Effects 
of endothelial DR6 (TNFRSF21) knockdown on 
MDA-MB-231 TC-induced EC necroptosis (b) 

and transmigrated TCs (c). d, Lung metastases in 
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numbers of necroptotic ECs and extravasated TCs, and these animals 
developed fewer metastases (Fig. 3e-j and Extended Data Fig. 8f). 
These DR6-mediated effects required ligand binding to DR6 as the DR6 
ectodomain fused to an Fc fragment (DR6-Fc) inhibited TC-induced 
endothelial necroptosis and transendothelial migration of TCs as well as 
development of metastasis (Fig. 3e, k-m and Extended Data Fig. 8g-j). 

A previously identified ligand of DR6 is amyloid precursor protein 
(APP)?! which is widely expressed including in various TCs**-*° 
(Extended Data Fig. 9a, b). Knockdown of APP expression in TCs 
strongly reduced their ability to induce endothelial necroptosis and 
to transmigrate an EC layer (Fig. 4a, b and Extended Data Fig. 9c). 
Neither conditioned media from TC-EC co-cultures (Extended Data 
Fig. 9d) nor the purified soluble APPsa fragment” or cells transfected 
with constructs to overexpress APPsa-induced endothelial necroptosis 
(Fig. 4c and Extended Data Fig. 9e). However, necroptosis was induced by 
direct contact of ECs with cells overexpressing the transmembrane full- 
length form of APP (HEK“?? or COS-14??) (Fig. 4d and Extended Data 
Fig. 9e). Thus, consistent with the notion that the efficacy of TNF-receptor 
family ligands to induce cell death is strongly increased when present in 
their transmembrane form?’, APP exposed by TCs rather than soluble 
APP released from TCs is required for TC-induced DR6-mediated 
endothelial necroptosis. TCs in which APP expression was strongly 
reduced by siRNA-mediated knockdown showed normal proliferation, 
cell survival and basal migratory activity (Extended Data Fig. 10a—d) but 
reduced ability to transmigrate a layer of primary lung ECs (Extended 
Data Fig. 10e, f). In addition, these TCs almost completely lost the ability 
after iv. injection to induce EC necroptosis, to extravasate and to form 
metastases (Fig. 4e-i and Extended Data Fig. 10g). Interestingly, epide- 
miological data indicate that high APP expression in TCs is associated 
with higher frequency of metastasis formation and poor prognosis”*-*°. 

How does endothelial necroptosis facilitate metastasis formation? It 
is conceivable that dying ECs provide a gap through which TCs can pass 
and start to extravasate (Extended Data Fig. 10h). It is also possible that 
damage-associated molecular pattern (DAMP) molecules, which are 
released from lysed necroptotic cells”®, act on TCs, neighbouring ECs 
or other cells and thereby promote TC extravasation and metastasis 
(Extended Data Fig. 10h). A DAMP molecule associated with necrotic 
or necroptotic cell death is ATP, which was shown to have pro- 
migratory effects on TCs” and to induce opening of the endothelial 
barrier and to promote TC metastasis*”. 

Our data show that TCs induce endothelial necroptotic death 
via APP and DR6, which is required for efficient TC extravasation 
and metastases. Since necrosis or necroptosis are rare events under 
physiological conditions, targeting DR6-mediated necroptosis of ECs 
in the context of tumour progression may represent a novel approach 
to prevent or treat metastasis. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 


Received 25 August 2015; accepted 4 July 2016. 
Published online 3 August 2016. 


1. Valastyan, S. & Weinberg, R. A. Tumor metastasis: molecular insights and 
evolving paradigms. Cell 147, 275-292 (2011). 

2. Wan, L., Pantel, K. & Kang, Y. Tumor metastasis: moving new biological insights 
into the clinic. Nature Med. 19, 1450-1464 (2013). 

3. Vanharanta, S. & Massagué, J. Origins of metastatic traits. Cancer Cell 24, 
410-421 (2013). 

4. Bos, P. D. et al. Genes that mediate breast cancer metastasis to the brain. 
Nature 459, 1005-1009 (2009). 

5. Padua, D. et al. TGFB primes breast tumors for lung metastasis seeding 
through angiopoietin-like 4. Ce// 133, 66-77 (2008). 

6. Gupta, G. P. et al. Mediators of vascular remodelling co-opted for sequential 
steps in lung metastasis. Nature 446, 765-770 (2007). 

7. Reymond, N., d’Agua, B. B. & Ridley, A. J. Crossing the endothelial barrier 
during metastasis. Nature Rev. Cancer 13, 858-870 (2013). 

8. Labelle, M. & Hynes, R. O. The initial hours of metastasis: the importance of 
cooperative host-tumor cell interactions during hematogenous dissemination. 
Cancer Discov. 2, 1091-1099 (2012). 


218 | NATURE | VOL 536 | 11 AUGUST 2016 


9. Joyce, J. A. & Pollard, J. W. Microenvironmental regulation of metastasis. Nature 
Rev. Cancer 9, 239-252 (2009). 

10. Galluzzi, L. et al. Essential versus accessory aspects of cell death: 

recommendations of the NCCD 2015. Cell Death Differ. 22, 58-73 (2015). 

11. Pasparakis, M. & Vandenabeele, P. Necroptosis and its role in inflammation. 

Nature 517, 311-320 (2015). 

12. Zhou, W. & Yuan, J. Necroptosis in health and diseases. Semin. Cell Dev. Biol. 

35, 14-23 (2014). 

13. Silke, J., Rickard, J. A. & Gerlic, M. The diverse role of RIP kinases in necroptosis 

and inflammation. Nature Immunol. 16, 689-697 (2015). 

14. Oberst, A. et al. Catalytic activity of the caspase-8-FLIP, complex inhibits 

RIPK3-dependent necrosis. Nature 471, 363-367 (2011). 

15. Kaiser, W. J. et a/. RIP3 mediates the embryonic lethality of caspase-8-deficient 

mice. Nature 471, 368-372 (2011). 

16. Krysko, D. V., Vanden Berghe, T., D’Herde, K. & Vandenabeele, P. Apoptosis and 
necrosis: detection, discrimination and phagocytosis. Methods 44, 205-221 
(2008). 

17. Pan, G. et al. Identification and functional characterization of DR6, a novel 

death domain-containing TNF receptor. FEBS Lett. 431, 351-356 

(1998). 

18. Lavrik, |., Golks, A. & Krammer, P. H. Death receptor signaling. J. Cell Sci. 118, 

265-267 (2005). 

19. Liu, J. et a/. Enhanced CD4* T cell proliferation and Th2 cytokine production in 

DR6-deficient mice. Immunity 15, 23-34 (2001). 

20. Huang, G. et a/. Death receptor 6 (DR6) antagonist antibody is neuroprotective 

in the mouse SOD1°9%* model of amyotrophic lateral sclerosis. Cell Death 

Disease 4, e841 (2013). 

21. Nikolaev, A., McLaughlin, T., O’Leary, D. D. M. & Tessier-Lavigne, M. APP binds 

DR6 to trigger axon pruning and neuron death via distinct caspases. Nature 

457, 981-989 (2009). 

22. Xu, K., Olsen, O., Tzvetkova-Robey, D., Tessier-Lavigne, M. & Nikolov, D. B. The 

crystal structure of DR6 in complex with the amyloid precursor protein 

provides insight into death receptor activation. Genes Dev. 29, 785-790 
(2015). 

23. Takagi, K. et al. Amyloid precursor protein in human breast cancer: an 
androgen-induced gene associated with cell proliferation. Cancer Sci. 104, 
1532-1538 (2013). 

24. Takayama, K. et al. Amyloid precursor protein is a primary androgen target 
gene that promotes prostate cancer growth. Cancer Res. 69, 137-142 
(2009). 

25. Yang, Z., Fan, Y., Deng, Z., Wu, B. & Zheng, Q. Amyloid precursor protein as a 
potential marker of malignancy and prognosis in papillary thyroid carcinoma. 
Oncol. Lett. 3, 1227-1230 (2012). 

26. Hick, M. et al. Acute function of secreted amyloid precursor protein fragment 
APPsa in synaptic plasticity. Acta Neuropathol. 129, 21-37 (2015). 

27. O'Reilly, L. A. et al. Membrane-bound Fas ligand only is essential for 
Fas-induced apoptosis. Nature 461, 659-663 (2009). 

28. Kaczmarek, A., Vandenabeele, P. & Krysko, D. V. Necroptosis: the release of 
damage-associated molecular patterns and its physiological relevance. 
Immunity 38, 209-223 (2013). 

29. Takai, E. et al. Autocrine regulation of TGF-31-induced cell migration by 
exocytosis of ATP and activation of P2 receptors in human lung cancer cells. 
J. Cell Sci. 125, 5051-5060 (2012). 

30. Schumacher, D., Strilic, B., Sivaraj, K. K., Wettschureck, N. & Offermanns, S. 
Platelet-derived nucleotides promote tumor-cell transendothelial 
migration and metastasis via P2Y2 receptor. Cancer Cell 24, 130-137 
(2013). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements We thank S. Fulda and our friends and colleagues for 
comments on the manuscript. We also thank S. Hummer for secretarial help 
and C. Ringel, J. Hoffmann, I.-M. Gross, D. Magalei and M. Winkels for technical 
help. This work was supported by the German Cancer Aid and the Max Planck 
Society. K.H. was supported by the China Scholarship Council. U.C.M. was 
supported by a grant from the Deutsche Forschungsgemeinschaft (MU 1457/9- 
2). M.P. received funding from the European Research Council (grant agreement 
323040), the Deutsche Forschungsgemeinschaft (SFB670, SFB829), Worldwide 
Cancer Research (grant 15-0228) and the Helmholtz Alliance Preclinical 
Comprehensive Cancer Center. 


Author Contributions B.S. performed most of the experiments and analysed 

the data. LY. generated mice with a conditional Ripk3 allele and contributed 

to in vitro and in vivo studies. J.A.J. contributed to in vitro experiments. L.W. and 
M.P. generated MLKL-deficient animals. K.H. and U.C.M. purified APPsa and 
performed APP-related experiments. B.S. and S.O. designed the study, discussed 
data and wrote the manuscript. All authors commented on the manuscript. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial 
interests. Readers are welcome to comment on the online version of the paper. 
Correspondence and requests for materials should be addressed to 
B.S. (boris.strilic@mpi-bn.mpg.de) or S.O. (stefan.offermanns@mpi-bn.mpg.de). 


Reviewer Information Nature thanks C. Betsholtz, S. Tavazoie and the other 
anonymous reviewer(s) for their contribution to the peer review of this work. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


METHODS 

Cell death assays. HUVECs, human microvascular vein ECs from lung 
(HMVECs-L) or L929 cells (1.5 x 104 at seeding in 10011) were cultured for 
24h in 96-well plates. To induce cell death, cells were stimulated overnight with 
rhTRAIL (100 ng/ml, Peprotech), rhTNF-« (50 ng/ml, Peprotech), Staurosporine 
(0.51M, Jena BioScience), HO. (1mM, AppliChem) or rmTNF-a (100 ng/ml, 
Peprotech) or cultured under hypoxic conditions (1% O2, O2 Control 
Glove Boxes, Coy Laboratory). Alternatively, for co-culture experiments, 
1.5 x 10° green fluorescent protein (GFP)-expressing TCs, calcein-AM-labelled 
TCs, COS-1 or HEK293T cells (for more details see below), or freshly isolated 
human peripheral blood mononuclear cells (PBMCs) stained with calcein-AM 
containing 20 times the number of platelets were added alone, in combination 
with each other or in the presence of the indicated substances onto the EC 
monolayer and cultured overnight: Nec-1 (30|1M), z-VAD-fmk (100 1M), 
etanercept (20,1g/ml), DR6-Fc or IgG -Fc (0.1-11g/ml), antagonistic antibody 
against DR6 (5D10) or IgG, isotype control antibody (1-30 |1g/ml). Unless stated 
otherwise, HUVECs and MDA-MB-231-GFP TCs were used for in vitro stud- 
ies. PBMCs were isolated using standard protocols with Ficoll density gradient 
centrifugation. For supernatant experiments, HUVEC monolayers grown to 
confluency were cultured with conditioned medium obtained from HUVECs 
co-cultured in the presence of TCs for 18h. For knockdown experiments, 
1.5 x 10* HUVECs were transfected using Lipofectamine RNAiMAX (Life 
Technologies) with different sets of siRNAs (Sigma or Qiagen, see Supplementary 
Table 1) and cultured on 96-well plates. In cases where siRNA-mediated knock- 
down was performed on TCs, cells were transfected using Lipofectamine 
RNAiMAX with different sets of siRNA (Sigma) and seeded 48 h after transfec- 
tion on confluent monolayers of ECs. Knockdown efficiencies were determined 
by western blotting upon lysis with Laemmli buffer or by quantitative RT-PCR 
(LightCycler480, Roche). TC number upon gene knockdown or after treatment 
with Nec-1 or zVAD was determined by counting Hoechst 33342-positive cells, and 
TC death was determined by counting condensed and/or EthD-III-positive nuclei 
(see below). Cell migration was determined by a scratch assay*". 

Cell death analysis. For all conditions, EthD-II]I (1.6,.M, Biotium) and Hoechst 
33342 (2\1M, Thermo Scientific) were added shortly before automated image 
acquisition in an atmosphere-controlled chamber (37°C, 5% COz) using an 
Olympus IX81 microscope or before overnight culture. On the basis of cells 
cultured under defined apoptotic, necrotic or necroptotic conditions and stained 
with Hoechst 33342 (a cell-permeable nuclear dye) and EthD-III (a membrane- 
impermeant nuclear dye), morphological criteria for discriminating apoptotic from 
necrotic (or necroptotic) cells compared with living cells were defined as follows. 
A living cell has a normal round to kidney-shaped nucleus (as visualized by Hoechst 
33342) and is negative for EthD-III. An apoptotic cell has a strongly condensed 
and fragmented nucleus and is negative for EthD-III. A necrotic or necroptotic 
cell has a normal round to kidney-shaped nucleus or a minor degree of nuclear 
shrinkage (no condensation and no fragmentation) and is positive for EthD-II. 
A late apoptotic cell is positive for EthD-III but can be discriminated from a 
necrotic/necroptotic cell on the basis of the strong condensation and fragmentation 
of its nucleus. To prevent repeatedly counting fragmented parts of apoptotic 
nuclei, a Gaussian blur with a radius of three pixels was applied to all images to 
be analysed. ECs were defined as GFP- or calcein-AM-negative cells. The total 
number of all endothelial nuclei was determined through a low threshold (TH1) 
and application of a watershed on the resulting binary image over all Hoechst 
33342-positive nuclei (minus the nuclei from TCs). When possible, a second 
separate threshold (TH2) was used to determine the number of condensed nuclei. 
The number of EthD-III-positive cells was determined through an independent 
second low threshold only. In cases where this automated analysis failed, the mode 
of cell death was determined manually for each individual EC by application of the 
criteria summarized above. All images were analysed in ImageJ (National Institutes 
of Health). Unless stated otherwise, each experiment was performed at least three 
times with a minimum of six wells per condition and four independent images 
acquired per well. 

Transwell assays. Assays were performed as described previously”. Briefly, ECs 
(1.5 x 104 at seeding in 50 j1l) were cultured for 2 d or, for knockdown experiments, 
8 x 10° HUVECs were transfected using Lipofectamine RNAiMAX with different 
sets of siRNA (Sigma or Qiagen, see Supplementary Table 1) and cultured on 
96-transwell plates with polyester membranes of 81m pore size (Corning) with 
daily medium changes until reaching confluency. For transmigration experiments, 
the medium of the upper compartment was removed and 7.5 x 10° GFP-expressing 
or calcein-AM-labelled TCs were added in 50,11 endothelial culture medium 
alone or in the presence of different substances (see above). For all experiments, 
transmigrated TCs on the lower side of the filter were imaged (Zeiss Axio Observer. 
Z1 or Olympus IX81) and quantified with ImageJ. Each experiment was performed 
at least three times with a minimum of six wells per condition. 
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Metastasis models. Unlabelled or CFSE-labelled TCs (B16F10 melanoma or 
LLC1 lung carcinoma cells) or fluorescent microspheres (151m diameter, Life 
Technologies) in 5011 PBS were injected to the lateral tail vein of mice at 1,600 TCs 
or microspheres per gram body weight. For inhibitory experiments, two doses of 
25 11 Nec-1 (1.65 1g/g), Nec- 1s (1.65 1g/g), z- VAD(OMe)-fmk (4 \1g/g), etanercept 
(10 mg/kg), rmDR6-Fc (0.2 j1g/g), rmIgG24-Fe (0.2 1g/g), anti-DR6 (3.5 1g/g) or 
control IgG; (3.5 }1g/g) were injected into the tail vein shortly before and 3h after 
TC injection. For evaluation of TC-induced EC death, 6h after injection of TCs, 
501 EthD-III (300 1M in PBS) were injected iv. and after 10 min animals were 
killed and perfused with PBS and 4% paraformaldehyde and directly processed for 
immunohistochemical analysis. For evaluation of the number of extravasated TCs, 
CFSE-labelled B16 TCs were injected i.v. and 6h later non-perfused lungs were 
isolated and fixed in 4% paraformaldehyde. Stained cryosections were analysed in 
xyz views on a Leica SP5 confocal microscope. The number of EthD-III-positive 
ECs was determined by manual counting of EthD-II/ERG- or EthD-III/CD31- 
double-positive cells on central areas of a minimum of four random longitudinal 
sections of the left lobe and the superior and middle right lobe of each lung. For 
quantification of extravasated TCs, cryosections were analysed by two criteria: 
TCs directly surrounded by CD31 staining (that is, blood vessel) and with a non- 
invasive phenotype (that is, round cell shape) were scored as intravascular, while 
cells outside blood vessels with an invasive phenotype (that is, irregular cell shape 
with protrusions) were scored as extravascular. For both analyses, numbers of 
EthD-III-positive ECs or extravasated TCs from each section of one lung were 
averaged (average per mouse), and these values were again averaged to obtain a 
mean value + s.e.m. from one experiment. For evaluation of lung metastases, an 
additional (third) treatment with the aforementioned substances was performed 
at 6h after TC iv. injection, and lung metastases were analysed 12 d later either 
macroscopically (B16/LLC1 iv.) or microscopically (MDA-MB-231 iv.) by 
analysing every tenth section after whole-lung sectioning and H&E staining. 
Alternatively, to study metastasis formation from primary tumours, 1 x 10°B16 or 
LLC1 TCs in 2511 PBS were injected subcutaneously into the shaved flank of mice, 
and the primary tumour was surgically removed 10 d later, followed by microscopic 
lung metastases analysis 27 d later as described above. The mean number of lung 
metastases per experiment was determined by averaging the individual values 
of lung metastases from each lung resulting in a mean value + s.d. from one 
experiment. A minimum of three animals per group was used. Animal groups 
were sex-matched, and mice were 8-16 weeks of age. Mice were grouped randomly. 
Experiments were performed blind for genotype of the mice. All experimental 
animal procedures were approved by the Hessian Regional Board. 

Mice. Control C57BL/6 and SCID animals were obtained from Charles River; 
B6F2D1 animals were obtained from Harlan. DR6~/~ animals were provided 
by Genentech (USA). Caspase-8'*?/'*? animals were a gift from S. Hedrick 
(University of California San Diego, USA) provided by C. Becker (University 
Medical Center Erlangen, Germany) and crossed to tamoxifen-inducible Tie2- 
CreER™ animals” to obtain Tie2-CreER™;Caspase-8!°?? animals (Casp8"“®°). 
To generate RIPK3 conditional knockout animals, an 880-bp fragment containing 
loxP-flanked exon 2 and 3 from Ripk3 (to obtain a premature stop codon upon 
cre-mediated recombination) as well as the 5’ homology arm and the 3’ homology 
arm was amplified from BAC RPCI-23-237G18 (Children’s Hospital Oakland 
Research Institute) and cloned into the pKOII targeting vector additionally 
containing a Frt-flanked neomycin resistance gene (neo®) and the diphtheria toxin 
A gene (dta) as negative selection marker. The targeting vector was linearized 
with NotI and introduced into V6.5 ES cells by electroporation. Upon treatment 
with 400 1g/ml G418, DNA from 400 clones was isolated, and screened for correct 
recombination by Southern blot. Two independent ES cell clones were injected into 
C57BL/6 blastocysts, which were subsequently transferred to pseudo-pregnant 
females to generate chimaeric offspring. Male chimaeras were bred with C57BL/6 
female mice to produce heterozygotes. The germ line transmission was confirmed 
in the F, generation using a PCR genotyping strategy. Mice were then crossed 
to Flp-deleter mice to remove the neomycin cassette and thereafter crossed 
with Tie2-CreER”? animals to obtain Tie2-CreER'?;RIPK3!P"", A loxP-PCR 
reaction was used for detection of the WT allele (+) and the floxed (fl) allele 
using the primers RIPK3-fl_fwd (5'-GATCCAGAGCTCCACGCCAAG-3’) and 
RIPK3-fl_rev (5’-TGGAGGACCAGAGGGAAGGT-3’) resulting in band sizes 
of 295 bp (WT) and 340 bp (fl), respectively. To induce recombination, animals 
were treated with 1 mg/d tamoxifen (Sigma) for 5 consecutive days and 7-9 d later 
experiments were started. Caspase-8 and RIPK3 deletion in ECs was confirmed by 
comparing protein levels on isolated ECs from lungs of induced knockout animals 
(Tie2-CreER™;Caspase-8!°°?? (= Casp8C*°) or Tie2-CreER™;RIPK30P lox? 
(= RIPK3®CX°)) with ECs from lungs of Cre-negative control animals (Caspase- 
gloxPloxP or RIPK3!°*?x?) using western blot. Quantification of in vivo permeability 
was performed using the Miles assay. Briefly, 11 d after induction of the knockout by 
tamoxifen, mice received a 100i] tail vein injection of 0.5% Evans blue dye in PBS. 
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After 30 min mice were killed, and extravasated blue dye was eluted from the 
lungs with formamide at 56°C and measured by spectrometry at 620 nm. 
Lungs cultured ex vivo for 18h after i.v. injection of 1 ml staurosporine (10,1M) 
served as positive controls for stainings with antibodies against cleaved 
caspase-3 or annexin-V or with TUNEL assay. To generate MLKL /~ animals 
a TALEN pair targeting the second exon of the MIk/ gene was designed using 
the TAL Effector Nucleotide Targeter 2.0 (https://tale-nt.cac.cornell.edu/ 
node/add/talen). A pair targeting ~70 bp downstream of the ATG with a 
19 bp spacer was chosen (MIkl_Talen_l: 5’-GCCGGAAACAATGCCAGCGT-3’ 
and MlkI_Talen_r: 5'-GCCTGCTACAGCCTCTCCAG-3’). Corresponding 
repeat-variable di-residues (RVDs) (NH HD HD NH NH NININIHDNINI 
NG NH HD HD NI NH HD NH NG for MIkl_Talen_l and HD NG NH NH 
NI NH NI NH NH HD NG NH NG NI NH HD NI NH NH HD for MIkl_ 
Talen_r) were cloned into the RCIscript-GoldyTALEN (Addgene 38142) using 
the Golden Gate TALEN and TAL Effector Kit 2.0 (Addgene 1000000024)* for 
subsequent in vitro transcription (mMESSAGE mMACHINE T3 Transcription 
Kit, Ambion). mRNA was purified using RNA spin columns (NucleoSpin RNA, 
Macherey Nagel) and microinjected into fertilized oocytes from B6D2F1 mice 
at a concentration of 25 or 50 ng/jl. Surviving embryos were transferred into 
pseudo-pregnant fosters the next day. Tails from the Fy generation were analysed 
by PCR (primers type_fwd 5’-GTCTTGCACGGTGGAGGTAT-3’ and type_rev 
5/-CCCAGACGTCTCTCAGCTTC-3’ resulting in a 440 bp product) followed 
by T7 endonuclease I assay (T7EI; T7 endonuclease I, NEB) to detect mutant 
offspring. Mutant mouse 64 was analysed by sequencing (GATC-Biotech) and 
the 8 bp deletion (A8) allele, which was predicted to result in a knockout was 
bred to homozygosity. Homozygous mice did not show an obvious phenotype 
and reproduced normally. Spleens and lungs of homozygous MLKL mutant 
mice were analysed for MLKL protein expression by western blot (MLKL: 
AP14272b, Abgent; a-tubulin: T6074, Sigma-Aldrich). For genotyping of 
MLKL~‘~ animals the type_rev primer (see above) in combination with 
MLKL_TAL_wt 5‘-ATGCCAGCGTCTAGGAAACC-3’ or MLKL_TAL_mut 
5'‘-GGAAACAATGCCAGCGTCAGT-3’ were used to obtain a 245 bp WT or 
244 bp knockout band, respectively. 

Identification of endothelial receptors involved in the regulation of TC-induced 
EC death. First, expression levels of 132 genes encoding members of receptor 
families known to mediate different forms of programmed cell death were 
analysed using cDNA from HUVEC, HMVEC-L or mouse lung ECs (MLECs) 
(Supplementary Table 1). RNA isolation and cDNA transcription were performed 
using standard protocols (Qiagen, Roche) and quantitative PCR was performed 
using the Universal ProbeLibrary System Technology (Roche). Genes 
were considered to be expressed when the corresponding cDNA was at a level 
>10~°-fold compared with the cDNA level of GAPDH. Genes for further screen- 
ing were selected on the basis of the following two criteria: first, the gene is highly 
expressed in at least one of the three tested ECs (among the top 22 genes tested); 
second, the gene is expressed in HUVECs (to be able to perform the screen), result- 
ing in a total of 44 genes for testing. The screen to identify potential receptors 
that mediate TC-induced EC death was performed in 96-well format. HUVECs 
were transfected using siRNA from Sigma for the initial screen. Only siRNAs 
resulting in expression levels <25% as determined by quantitative PCR were 
used. Gene knockdowns in HUVECs cultured in the absence of TCs resulting 
in monolayers of less than 75% confluency compared with HUVECs transfected 
with scrambled control-siRNA were not included in the final analysis. Individual 
experiments where the knockdown of RIPK3 (used as positive control) did not 
result in a decreased ratio of TC-induced EC death compared with cells transfected 
with scrambled siRNA were not included in the final analysis. Forty-eight hours 
after gene silencing, 2 x 10? MDA-MB-231-GFP TCs were seeded on confluent 
monolayers of HUVECs in the presence of 100j1M z-VAD-fmk. After overnight 
co-culture, the number of dead ECs was determined as described above. The screen 
was performed in five independent rounds with duplicates in each round. The 
ratio of EC necroptosis for each condition was defined as the effect of each siRNA 
to alter TC-induced endothelial necroptosis compared with control HUVECs 
(scrambled siRNA) treated with TCs. Background cell death in HUVECS trans- 
fected with scrambled siRNA and cultured without TCs was determined for each 
plate individually and subtracted from each value. For values >1, gene knockdown 
resulted in an increase in TC-induced endothelial necroptosis; for values = 1, gene 
knockdown resulted in no change; for values <1, gene knockdown resulted in a 
decrease in TC-induced endothelial necroptosis. Independent siRNAs from Qiagen 
were used for testing potential hits in independent experiments. 

Irradiation procedure and bone marrow transplantation. DR6*/+ WT or 
DR6~'~ mice at the age of 8-10 weeks were subjected to total body irradiation 
with 8.5 Gy. The X-ray machine (RadSource RS-2000) was operated at 160 kV, 
4.2kW at a dose rate of 1.2 Gy/min in air. Bone marrow cells were harvested by 


gently flushing both femora and both tibiae with DMEM (GIBCO). Cells were 
centrifuged, resuspended in HBSS (without Mg?* or Ca?*), and 5 x 107 cells in 
100 il (viability >95%) were injected i.v. into irradiated recipient mice on the day 
of irradiation. TCs were injected 6 weeks after transplantation, and numbers of 
EC deaths and extravasated TCs were determined 6h later. The number of surface 
metastases was determined after 12 days. 

Generation of APP-expressing cells. To generate HEK cells stably expressing 
membrane-bound full-length huAPP¢95 (HEK“??), HEK293T cells were 
transfected using a modified FUGW vector! expressing huAPP.95 with an amino 
(N)-terminal myc-tag (mycAPP) under control of the ubiquitin promoter and 
a puromycin resistance marker for the selection. For the generation of COS-1 
cells expressing the membrane-bound full-length form of huAPP¢95 (COS-14??) 
or the soluble huA PP¢95-c fragment (COS-14?s“), COS-1 cells were transiently 
transfected using Lipofectamine 2000 and the plasmids pCAX APP 695 or pCAX 
APPs-695-q, respectively. Cells were used 24h after transfection for further 
analyses. For the detection of APP, cells were lysed in CHAPS Lysis Buffer or 
media were conditioned for 24h and further processed for western blotting using 
anti-APP (MAB348, Millipore (clone 22C11)). 8-Tubulin (MAB3408, Millipore) 
served as loading control. pCAX APP 695 and pCAX APPs-695-a were a gift from 
D. Selkoe and T. Young-Pearse (Addgene plasmids 30137 and 30147)”. 
Expression analyses. Cells from individual wells of a 6- or 96-well plate were 
harvested (1.5 x 10° or 5 x 10° at seeding) and RNA was isolated and transcribed 
into cDNA according to the manufacturers’ protocols (Qiagen, Roche). For 
quantitative RT-PCR using the Roche LightCycler480 Probes Master System, 30 ng 
cDNA per reaction were used. Primers were designed with the online tool provided 
by Roche, and only primer pairs from the top three results were chosen. Relative 
expression levels were obtained by normalizing with GAPDH. For single-cell gene 
expression analyses, RNA isolation and cDNA synthesis was performed using the 
96-well C1 Single-Cell Auto Prep Array for mRNA Seq (17-25 ,1m) from Fluidigm. 
qRT-PCR was performed with the LightCycler480 Probes Master System (Roche) 
using intron-spanning primers. Relative expression values from individual cells 
were determined by the 2LePct-ct) method with a limit of detection ct (LoDct) 
value set to 35 cycles, and values were normalized to the cell with the highest gene 
expression (100%). Values from empty wells or wells that contained more than 
one cell, as determined by visual inspection, were excluded from the final analysis. 
Cells. Human primary ECs and media were from Lonza. MDA-MB-231-GFP 
TCs were from AntiCancer. THP-1, A549, PC3, MeWo and SK-MEL-28 cells 
were from CLS. COS-1, B16F10 and LLC1 were from ATCC. L929 cells were a 
gift from J. Wiegers (Biocentre Innsbruck, Austria), U-87 MG cells were from S. 
Rieken (University Hospital, Heidelberg, Germany), MIA PaCa-2 and CFPAC-1 
cells were from N. Giese (University Hospital, Heidelberg, Germany), Sh-SY5Y, 
HeLa and HT1080 were from M. Bahr (DKFZ, Germany) and MOLT-4 cells 
were from J. Witkowski (Medical University of Gdansk, Poland). All cells were 
incubated at 37°C and 5% CO. HUVECs and HMVECs-L were cultured in growth 
factor-supplemented EGM2 or EGM2-MV medium, respectively, and passages 
<P6 were used for all experiments. All other cell lines were cultured in either RPMI 
or DMEM supplemented with 10% EBS, penicillin/streptomycin (100 units/ml) 
and glutamine (2 mM). Cells were tested negative for mycoplasma contamination 
before experiments. The cell lines are not listed in the International Cell Line 
Authentication Committee (ICLAC) database. Primary mouse lung ECs were 
isolated and cultured as described previously*®. For knockout induction in vitro, 
cells were treated with 4-OH tamoxifen (11M) for 48h. 

Other materials. Media and supplements were from Life Technologies. The 
following reagents were used: tamoxifen (Sigma), 4-OH tamoxifen (Sigma), Nec-1 
(Enzo), Nec-1s (BioVision), z- VAD-fmk (Alexis), z-VAD(OMe)-fmk (Cayman), 
1-MT (Sigma), etanercept (Pfizer), rhDR6-Fc (144-DR, R&D), rhIgG;-Fc (110-HG, 
R&D), rmDR6-Fc (6985-DR, R&D), rmIgG2,-Fe (4460-MG, R&D), calcein-AM 
(AAT Bioquest), CFSE (Alexis), DAPI (Life Technologies), TUNEL staining kit 
(Roche). The following antibodies were used for western blot analyses: RIPK3 
(ab56164, Abcam), caspase-8 (3473, ProSci), DR6 (7678R, Bioss), VE-cadherin 
(sc-6458, Santa Cruz), «-tubulin (T9026, Sigma). The following antibodies were 
used for immunohistochemistry: anti-CD31 (human) (NB600-562, Novus), 
anti-CD31 (mouse) (550274, BD Biosciences), anti- DR6 (7678R, Bioss), anti- 
cleaved caspase-3 (9661S, Cell Signaling), anti-annexin-V (sc-4252, Santa Cruz), 
anti-ERG (ab110639, Abcam), anti-CD45 (553082, BD Biosciences), anti-phospho- 
MLKL (ab187091, Abcam). Antagonistic anti-DR6 (clone 5D10) and control IgG; 
(clone MOPC-21) antibodies were provided by S. Mi (Biogen Idec, Cambridge, 
USA). Recombinant APPsa was purified as described previously”®. 

Human samples. Frozen human tissue samples were obtained from Zyagen. 
Experiments with human samples were performed according to the regulations 
of the local ethics committee of the Hessian Regional Medical Board, and informed 
consent was obtained from all subjects. 
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Statistical analysis. Trial experiments or experiments done previously were used 
to determine sample size with adequate statistical power. In vitro experiments 
were not randomized and the investigators were not blinded to them, whereas the 
in vivo experiments were randomized and investigators were blinded. Samples 
were excluded in cases where RNA/cDNA quality or tissue quality after processing 
was poor (below commonly accepted standards). Animals were excluded from 
experiments if they showed any signs of sickness. Data represent biological 
replicates. In all studies, comparison of mean values was conducted with unpaired, 
two-tailed Student’s t-test or one-way or two-way ANOVA with Bonferroni's post 
hoc test. In all analyses, statistical significance was determined at the 5% level 
(P<0.05). Depicted are mean values + s.d. or + s.e.m. as indicated in the figure 
legends. Statistical analysis was performed with Prism5 or Prism6 (GraphPad) or 
Excel (Microsoft) software. 
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Extended Data Figure 1 | TCs induce necroptosis in ECs. a, Criteria for 
the discrimination of apoptotic and necrotic EC death. Fluorescent images 
of HUVECs cultured under conditions to induce apoptosis (TRAIL, 
Staurosporine) or to induce necrosis (H2O2, hypoxia (1% O2)) and 

stained with Hoechst 33342 (blue) and the membrane-impermeable dye 
ethidium homodimer II (EthD-III, red). Asterisks indicate apoptotic ECs 
(condensed, fragmented nuclei, EthD-III-negative); closed arrowheads 
indicate necrotic ECs (normal nuclei, EthD-III-positive). Late apoptotic 
cells are indicated by open arrowheads (condensed/fragmented nuclei, 
EthD-II-positive); scale bar, 5\1m. b, c, No annexin-V-positive cells were 
detected in HUVECs cultured in the presence of TCs (MDA-MB-231). 
Stimulation with TNF-a served as positive control; scale bar, 20,1m. 

d, Fluorescent images of L929 cells stimulated with TNF-a to induce 
programmed necrosis (necroptosis, arrowheads). This effect was 

reversed when cells were additionally cultured with the RIPK1 inhibitor 
necrostatin-1 (Nec-1); scale bar, 10m. e, Effect of freshly isolated PBMCs 
on EC necrosis either directly or in the presence of MDA-MB-231 TCs. 
PBMCs contained 20 times the amount of platelets (that is, 3 x 104, 9 x 104 


LETTER 


or 3 x 10° platelets). f-h, Quantification of necrosis in HMVEC-L (f), 

in freshly isolated primary mouse lung ECs (prim. MLEC) (g) or in 
HUVEC (h) cultured in the presence of different human and mouse 

TCs (TCs) and at different concentrations as indicated. i, Representative 
confocal images of HUVEC cultured in the absence of TCs (-TC) or 
presence of TCs (+TC, green) and stained as indicated; scale bar, 51m. 
Quantification of EthD-III- and phospho (p)-MLKL-double-positive 
ECs (more than 50 EthDIII-positive cells were analysed). j, k, Analysis 
of knockdown efficiencies in HUVEC by western blot for RIPK3 and 
caspase-8 (j) or by quantitative RT-PCR for MLKL (k). a-Tubulin served 
as loading control in j and relative mRNA expression levels normalized 
to GAPDH and to the level detected in scramble siRNA-treated samples 
(siCTRL) are shown in k. Shown are representative data of two (c, g) 

or three (e, f, h) independent experiments with mean values + s.e.m. 
from biological sextuplicates (n= 6). *P < 0.05; **P < 0.01; ***P < 0.001; 
n.s., not significant. One-way ANOVA and Bonferroni's post hoc test 

(c, e, g, h) or unpaired, two-tailed Student's t-test (f). For gel source data 
see Supplementary Fig. 1. 
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Extended Data Figure 2 | TCs induce necroptosis in ECs in vivo. 

a, Quantitative evaluation of cleaved caspase-3- and EthD-III-positive 
cells in lungs of C57BL/6 WT animals at the indicated time points after iv. 
injection of B16 TCs on the basis of the analysis of confocal microscopy 
images as shown in main Fig. 2a, b. TCs were injected at the same time 

and lung isolation occurred at the indicated time points. Injection of PBS 
served as control. b, Quantification of EthD-III-positive ECs in lungs of 
WT animals 6h after i.v. injection of B16 or LLC1 TCs. c-e, Representative 
confocal images of lung sections taken 6h after i.v. injection of B16 


PBS 


LLC1 


B16 


or LLC1 TCs (c, d) or injection of equal amounts of fluorescently 


labelled 


15\1m microspheres (e) into WT animals and stained for the indicated 


markers. Arrowheads in c indicate EthD-III-positive cells. Isolate 


d lungs 


from animals cultured ex vivo in the presence of staurosporine served as 


positive controls in d. Scale bar, 20 jum. Shown are representative 
two (a) or three (b) independent experiments with mean values 4 
from n=3 animals per time point (a) or n= 6 animals per group 
** P< 0.01; ***P < 0.001. One-way ANOVA and Bonferroni's 
post hoc test. 
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Extended Data Figure 3 | Targeting strategy for the generation of 
mice with JoxP-flanked (floxed (fl)) Ripk3 allele. a, Targeting scheme 
for the generation of floxed RIPK3 including 3’ Southern screening for 
the identification of positive ES cell clones and 3’ and 5’ Southern blot 
confirmation of heterozygous and homozygous floxed Ripk3 alleles, 
respectively. b, Western blot analysis of RIPK3 and VE-cadherin (VE-Cad) 
in primary ECs isolated from lungs (MLEC) of tamoxifen-treated Tie2- 
CreER™;RIPK3!0xP/0xP animals (RIPK3®CX°), Cre-negative littermates 
served as control. a-tubulin served as loading control. c, Quantification 
of Evans blue permeability in the lungs of RIPK3®C*° animals. 

d, Quantification of transmigrated B16 or LLC1 TCs over a layer of 
DMSO- or 4-OH-tamoxifen-treated primary MLEC isolated from 
uninduced Tie2-CreER!?;RIPK3!0?/!ox? animals or Cre-negative control 
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littermates. e, f, Quantification of lung metastases 12 d after i.v. injection 
(e) or 27 d after excision of a primary tumour induced by s.c. injection 
(f) of B16 or LLC1 TCs into RIPK3®*° animals. Cre-negative littermates 
served as control. No significant differences in primary tumour growth 
were observed (data not shown). Shown are representative data of three 
(c-e) or two (f) independent experiments with mean values + s.d. from 
n=3 (c) orn=6 (e) animals per group or from n= 11 (B16, control), 
n=6 (B16, RIPK3®*°), 1 =8 (LLC1, control) and n=6 (LLC1, 
RIPK32CK°) animals (f) or n =6 wells per condition (d). *P < 0.05; 

**P < 0.01; n.s., not significant. Unpaired, two-tailed Student's t-test 

(c, e, f) or one-way ANOVA and Bonferroni's post hoc test (d). For gel 
source data see Supplementary Fig. 1. 
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Extended Data Figure 4 | Reduced TC-induced endothelial necroptosis 
and metastasis in MLKL~'~ mice. a, A TALEN pair targeting the 
indicated sequences in the second exon of the mouse MIkl gene was 
cloned and transcribed into mRNA. b, Analysis of mice born after mRNA 
injection into fertilized oocyte injection by PCR for exon 2 (top) and 
subsequent sequence specific endonuclease assay (T7EI, bottom) for 

the detection of mutant alleles. DNA from C57BL/6 (WT) or TALEN- 
transfected cells (mut) served as control. c, The mutant allele of mouse 

64 was further analysed using Sanger sequencing revealing a 8 bp deletion 
(A8), predicted to generate a premature stop codon. d, Spleen and lung 
extracts of three BbD2F1 MLKL*/* WT and three homozygous MLKL 
mutant mice (MLKL~/~) were probed for MIkl protein expression. 
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Polyclonal MLKL antibody detected Mlk] protein only in WT extracts 

but also an additional non-specific band of similar size. e-i, B6D2F1 
MLKL*!*+ WT or MLKL~/~ animals were injected i.v. with B16 TCs and 
lungs were analysed after 6h for pulmonary EthD-III-positive ECs (f) and 
extravascular TCs (g) or after 12 d for lung metastases (i). Representative 
confocal images of lung sections and images of whole lungs are shown in 
(e, h); scale bar, 50j1m. Shown are representative data of two independent 
experiments with mean values + s.e.m. (f, g) or + s.d. (i) from n=3 
animals per condition (f, g) orn =8 (MLKL*!*) and n=11 (MLKL~-) 
animals (i). **P< 0.01; ***P < 0.001. Unpaired, two-tailed Student's t-test. 
For gel source data see Supplementary Figs 1 and 2. 
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Extended Data Figure 5 | Effect of Nec-1 and Nec-1s treatment 

on metastasis formation. a—c, Effect of Nec-1 (30\1M) on B16 TC 
proliferation (a), viability (b) and migration (c). d, Representative confocal 
images of lung sections taken 6h and images of whole lungs taken 

12 d after i.v. injection of B16 TCs into WT animals treated with DMSO 
(control) or Nec-1; scale bar, 501m. e, f, Lung metastases 12 d after iv. 
injection of LLC1 TCs into WT animals treated with Nec-1 (e) or B16 
TCs injected into WT animals treated with stable Nec-1s (f). DMSO 
served as control. g, h, Human MDA-MB-231 TCs were injected i.v. into 
Nec-1 treated immunodeficient SCID mice and pulmonary EthD-III- 
positive endothelial and lung metastases were analysed after 6 h and 12d, 


respectively. Arrowheads in the H&E-stained lung sections (h) indicate 
metastases; scale bar, 501m. In all metastasis experiments, animals were 
treated with Nec-1 or Nec-1s shortly before and at 3h after TC injection 
(plus at 6h for the 12 d experiment). Shown are representative data 

of three (a-c, e, g) or two (f, h) independent experiments with mean 
values + s.e.m. (a, b, g) or + s.d. (c, e, f, h) from biological sextuplicates 
(n=6) (a, b), triplicates (n = 3) (c) or from n = 3 (g) or n=6 animals 
per condition (e, f, h). *P < 0.05; **P< 0.01; ***P < 0.001; n.s., not 
significant. Unpaired, two-tailed Student’s t-test (a-c, e, f, h) or one-way 
ANOVA and Bonferroni's post hoc test (g). 
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Extended Data Figure 6 | See next page for caption. 
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Extended Data Figure 6 | Effects of zVAD treatment and EC-specific 
loss of caspase-8 on metastasis formation. a, Western blot analysis of 
uncleaved caspase-8 (Casp8) and VE-cadherin (VE-Cad) in primary 
MLECs of tamoxifen-treated Tie2-CreER™?;Casp8!*? HoxP animals 
(Casp8®C*°). b, Quantification of Evans blue permeability in the lungs of 
Casp8ECK° animals. c, Quantification of transmigrated B16 or LLC1 TCs 
over a layer of DMSO- or 4-OH-tamoxifen-treated primary MLEC from 
uninduced Tie2-CreER™;Casps!?’ loxP animals or Cre-negative control 
littermates. d, e, Quantification of lung metastases 12 d after iv. injection 
(d) or 27 d after excision of a primary tumour induced by s.c. injection 
(e) of B16 or LLC1 TCs into Casp82*° animals. Cre-negative littermates 
served as control. No significant differences in primary tumour growth 
were observed (data not shown). f-h, Effect of z- VAD-fmk (zVAD, 
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100 1M) on B16 TC proliferation (f), viability (g) and migration (h). 

i, Representative confocal images of lung sections taken 6h and images 
of whole lungs taken 12 d after i-v. injection of B16 TCs into WT animals 
treated with DMSO (control) or zVAD shortly before and at 3h after 

TC injection (plus at 6h for the 12 d experiment); scale bar, 501m. Shown 
are representative data of two (b, c, e) or three (d, f-h) independent 
experiments with mean values + s.d. (b-e, h) or + s.e.m. (f, g) from n =3 
(b) or n=5 (d) animals per group or from n= 11 (B16, control), n=7 
(B16, Casp8ECK9), n=8 (LLCl1, control) and n= =6 (LLC1, Casp82CK°) 
animals (e) or from biological sextuplicates (n =6) (c, f, g) or triplicates 
(n= 3) (h). *P < 0.05; ***P < 0.001; n.s., not significant. Unpaired, 
two-tailed Student's t-test (b, d-h) or one-way ANOVA and Bonferroni's 
post hoc test (c). For gel source data see Supplementary Fig. 1. 
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Extended Data Figure 7 | See next page for caption. 


Extended Data Figure 7 | DR6 is expressed in ECs, and DR6 expressed 
in immune cells is not involved in TC metastasis. a, Western blot analysis 
of lysates from HUVECs transfected with scrambled siRNA (siCTRL) or 
different sets of siRNAs directed against mRNA encoding DR6 (siDR6). 
The antibody detects the 90 kDa glycosylated form of DR6. a-tubulin 
served as loading control. b, Expression levels of DR6 in different human 
or mouse ECs as determined by quantitative PCR. Shown are relative 
expression levels normalized to GAPDH levels. c, Confocal images 

of human tissues of the indicated origin stained for CD31 (red), DR6 
(green) and cell nuclei (DAPI, blue) (left panel). Control-IgG antibody 
and donkey anti-rabbit secondary antibody coupled to AF488 served as 
negative controls (right panels); scale bar, 541m. d, Analysis of HUVEC 
single-cell gene expression for DR6 as determined by the 2'°?*'*t method 
and normalized to the cell with the highest expression level (100%). Each 
bar represents the gene expression level of one individual cell (data of a 
total of 80 cells analysed are shown). Single-cell analysis revealed that less 
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than 10% of ECs express DR6. e-j, Irradiated DR6*/* or DR6~/~ animals 
were reconstituted with bone marrow cells from DR6*/* or DR6~/~ donor 
animals, respectively (DR6*/+ — DR6*’t, DR6~/~ — DR6*/* or DR6t/* 
— DR6~‘~) and quantitative PCR analysis of DR6 expression in PBMCs 
was performed (e) or bone marrow chimaeras as indicated were injected 
i.v. with B16 TCs and lungs were analysed after 6h for pulmonary EthD- 
II-positive ECs (g) and extravascular TCs (h) or after 12 d for lung 
metastases (j). f, i, Representative confocal images of lung sections stained 
for the indicated markers (f) and images of whole lungs (i); scale bar, 
501m. Shown are representative data of two independent experiments 
with mean values + s.d. (e, j) or + s.e.m. (g, h) from n =3 (e) orn =4 

(g, h) animals per condition or n=6 (DR6t’* + DR6*!*), n=4 (DR6~/~ 
— DR6*'*) and n=5 (DR6t/* — DR6~/~) animals (j). **P < 0.01; 

n.s., not significant. One-way ANOVA and Bonferroni's post hoc test. 

For gel source data see Supplementary Fig. 1. 
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Extended Data Figure 8 | Effects of anti-DR6 antibody and DR6-Fc on 
tumour metastasis. a—i, Quantification of MDA-MB-231 TC-induced 
EC necroptosis (a, ¢, g), transmigrated TCs over an endothelial layer 

(b, e, h), B16 TC-induced EC necroptosis after i-v. injection into C57BL/6 
WT animals (d) or lung metastases 12 d after iv. injection of LLC1 TCs 
into WT animals (f, i) upon treatment with an anti-DR6 antibody (5D10) 
(a, b, f), etanercept (c-e) or the extracellular domain of DR6 fused to DR6 
(DR6-Fc) (g-j). Animals were treated shortly before and 3h after 

TC injection (d) as well as 6h after TC injection (f, i). PBS, an IgG, 
antibody or the Fc domain of IgG, or IgG, (IgG)-Fc, IgG24-Fc) served as 


controls. g, Representative confocal images of lung sections taken 

6h and images of whole lungs taken 12 d after iv. injection of B16 TCs 
into WT animals treated with IgG>,-Fc (control) or DR6-Fc shortly before 
and at 3h after TC injection (plus at 6h for the 12 d experiment); scale bar, 
50\.m. Shown are representative data of three independent experiments 
with mean values + s.e.m. (a-e, g, h) or + s.d. (f, i) from biological 
sextuplicates (n= 6) (a-c, e, g, h) or from n = 3 (d) or n=4 (f, i) animals 
per group. *P < 0.05; **P < 0.01; ***P < 0.001; n.s., not significant. 
Two-way ANOVA and Bonferroni’s post hoc test (a-d, g, h) or unpaired, 
two-tailed Student’s t-test (e, f, i). 
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Extended Data Figure 9 | APP is expressed in different murine and 
human TCs. a, Expression levels of APP in different human and mouse 
TCs as determined by quantitative PCR. Shown are relative expression 
levels normalized to GAPDH levels. b, MDA-MB-231 single-cell gene 
expression analysis for APP as determined by the 2/°P**t method and 
normalized to the cell with the highest expression level (100%). Each 

bar represents the gene expression level of one individual cell (data of a 
total of 78 cells analysed are shown). Single-cell analysis revealed that all 
TCs express APP. c, Analysis of knockdown efficiency in MDA-MB-231 
TCs using different siRNAs against APP (siAPP). Shown is the relative 
mRNA expression normalized to GAPDH levels and to the level detected 
in scramble siRNA-treated samples (siCTRL). d, Quantification of EC 
necroptosis in HUVECs cultured in the presence of MDA-MB-231 TCs 
(direct contact) or exposed to the supernatant of TC-EC co-cultures after 
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18h of culture (condit. medium). e, Western blot analysis of cell lysates 
or conditioned media (supernat.) of parental HEK293T (HEK) cells, 
HEK cells stably expressing membrane-bound full-length APP¢95 
(HEK*??), mock-transfected COS-1 cells (COS-1) or COS-1 cells 
transiently expressing membrane-bound full-length APP¢95 (COS-14??) 
or soluble APPsc (COS-14??8*), Note that secreted APPsa from COS-1 
cells (supernat.) compared with APPsa found in the corresponding cell 
lysates is glycosylated and thus runs at higher molecular mass. Anti- APP 
(22C11) was used to detect APP. Membranes were cut to detect 3-tubulin 
as loading control. Shown are representative data of three independent 
experiments with mean values + s.e.m. from biological sextuplicates 
(n=6). **P<0.01; ***P < 0.001. Two-way ANOVA and Bonferroni’s post 
hoc test. For gel source data see Supplementary Fig. 1. 
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EXTRAVASATION AND METASTASIS 


Extended Data Figure 10 | Effects of loss of APP on TCs and on 
metastasis formation. a, Analysis of knockdown efficiency in B16 and 
LLC1 TCs using different siRNAs against mRNA encoding APP (siAPP). 
Shown is the relative mRNA expression normalized to GAPDH levels and 
to the level detected in scramble siRNA-treated samples (siCTRL). 

b-d, Knockdown of APP in B16 or LLC1 TCs (B16°'4?? and LLC1*4??) 
and evaluation of cell proliferation (b), viability (c) and migration (d). 

e, f, Evaluation of APP-deficient TC-induced EC death in vitro in C57BL/6 
WT primary MLECs (e) and the ability of APP-deficient TCs to migrate 
over an endothelial layer (f). g, Quantification of lung metastases 

12 d after i.v. injection of LLC1 TCs with silenced APP expression 
(LLC1%4??) into WT animals. Shown are representative data of three 


independent experiments with mean values + s.e.m. (b,c, e) or + s.d. 

(d, f, g) from biological sextuplicates (n =6) (b, ¢, e, f), triplicates (n = 3) 
(d) or from n=5 animals per condition (g). *P < 0.05; **P< 0.01; *** 
P<0.001; n.s., not significant. Unpaired, two-tailed Student's t-test (b, ¢, f) 
or one-way ANOVA and Bonferroni's post hoc test (e, g). h, Model: TCs 
induce RIPK1/RIPK3/MLKL-dependent necroptosis in ECs via APP-DR6. 
TCs then may directly pass through the emerging gap after 

EC death. Alternatively, or in parallel, damage-associated molecular 
pattern molecules (DAMPs) released from necroptotic ECs could act on 
TCs and/or non-necroptotic endothelial as well as other cells to promote 
TC extravasation and metastasis. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


doi:10.1038/nature19070 


Global profiling of SRP interaction with nascent 


polypeptides 


Daniela Schibich!, Felix Gloge!, Ina Pohner®, Patrik Bjorkholm*+>°, Rebecca C. Wade*’, Gunnar von Heijne*”, 


Bernd Bukau!? & Giinter Kramer! 


Signal recognition particle (SRP) is a universally conserved 
protein-RNA complex that mediates co-translational protein 
translocation and membrane insertion by targeting translating 
ribosomes to membrane translocons!. The existence of parallel 
co- and post-translational transport pathways”, however, raises the 
question of the cellular substrate pool of SRP and the molecular 
basis of substrate selection. Here we determine the binding sites 
of bacterial SRP within the nascent proteome of Escherichia coli at 
amino acid resolution, by sequencing messenger RNA footprints of 
ribosome-nascent-chain complexes associated with SRP. SRP, on 
the basis of its strong preference for hydrophobic transmembrane 
domains (TMDs), constitutes a compartment-specific targeting 
factor for nascent inner membrane proteins (IMPs) that efficiently 
excludes signal-sequence-containing precursors of periplasmic and 
outer membrane proteins. SRP associates with hydrophobic TMDs 
enriched in consecutive stretches of hydrophobic and bulky aromatic 
amino acids immediately on their emergence from the ribosomal 
exit tunnel. By contrast with current models, N-terminal TMDs are 
frequently skipped and TMDs internal to the polypeptide sequence 
are selectively recognized. Furthermore, SRP binds several TMDs 
in many multi-spanning membrane proteins, suggesting cycles of 
SRP-mediated membrane targeting. SRP-mediated targeting is 
not accompanied by a transient slowdown of translation and is not 
influenced by the ribosome-associated chaperone trigger factor 
(TE), which has a distinct substrate pool and acts at different stages 
during translation. Overall, our proteome-wide data set of SRP- 
binding sites reveals the underlying principles of pathway decisions 
for nascent chains in bacteria, with SRP acting as the dominant 
triaging factor, sufficient to separate IMPs from substrates of 
the SecA-SecB post-translational translocation and TF-assisted 
cytosolic protein folding pathways. 

For determination of the nascent substrates of E. coli SRP, we used 
selective ribosome profiling (SeRP)* which compares two ribosome pro- 
filing data sets generated from the same culture (Extended Data Fig. 1a). 
The translatome data set comprises the ribosome footprints of all 
ribosome-nascent-chain complexes (RNCs) and reveals the protein 
synthesis activity of the cells. The SRP interactome data set comprises 
the footprints of only SRP-engaged RNCs, purified by immunoprecipita- 
tion using SRP-specific antibodies (Extended Data Figs 1b, c). The ratio 
of both data sets discloses the nascent SRP interactome and provides 
sequence-resolved interaction profiles of SRP. To enhance the specificity 
of SeRP, we performed cell lysis and SRP-RNC purifications in the 
presence of low concentrations of detergents. SeRP without detergent 
indicated that all relevant features of co-translational SRP action 
described later were consistently detected (Extended Data Fig. 2). 

We identified SRP substrates by ratio-based analysis of translatome 
and interactome data sets using a threshold of two (SRP interactome/ 


translatome). Almost all SRP substrates identified are IMPs (Figs 1a, b 
and Extended Data Fig. 1d), suggesting that SRP specifically targets this 
class of proteins. However, a number of IMPs do not pass the thresh- 
old despite their interaction profiles indicating strong transient SRP 
binding. One example is CopA, which transiently binds SRP but has 
a ratio of only 0.85 (Extended Data Fig. le). We therefore developed 
a peak detection algorithm that identifies reproducibly detected SRP 
peaks passing a fivefold threshold. Substrate pools identified by either 
method overlap largely, but not completely, and together reveal SRP 
substrates with high confidence (Fig. 1b and Supplementary Table 1). 

Among the 2,367 detected nascent E. coli proteins are 566 SRP inter- 
actors, which according to annotations (Uniprot/Ecocyc) are composed 
of 488 IMPs, 14 periplasmic/outer membrane proteins, 50 cytoplasmic 
proteins and 14 proteins with unknown localization. The SRP substrate 
list includes 87% of all IMPs, 6% of all periplasmic/outer membrane 
proteins, 3% of all cytoplasmic proteins and 14% of all proteins with 
unknown localization. Most remaining IMPs do not qualify as SRP 
substrates either because they do not pass the fivefold threshold (12%) 
or because of the reduced reproducibility between replicates (86%, 
Pearson correlation coefficients <0.6), suggesting that these IMPs are 
SRP substrates as well. These findings indicate a universal function 
of E. coli SRP in IMP targeting, consistent with yeast SRP acting as 
the predominant targeting factor for TMD-bearing proteins*. The 
five shortest IMPs (<50 amino acids) detected do not interact with 
SRP (Supplementary Table 1 and Extended Data Fig. 3). Interestingly, 
targeting of two of these (YbgT (also known as CydX), 37 amino 
acids; and YbhT (also known as AcrZ), 49 amino acids) is impaired 
in YidC-depleted cells°, suggesting that membrane insertion of these 
and potentially other short IMPs is mediated by YidC but not SRP. 
This agrees with earlier findings of SRP-independent translocation 
of small peptides into the endoplasmic reticulum of mammalian 
cells’. 

Our data sets imply that periplasmic and outer membrane pro- 
teins, including the assumed SRP substrate DsbA, are efficiently 
discriminated by SRP. The single DsbA-SRP interaction peak detected 
in detergent-free SeRP (Extended Data Figs 1f and 2b) is too short 
to resemble real binding and is not correlated with the DsbA signal 
sequence. Reminiscent of the yeast SRP substrate pool, the largest 
group of SRP substrates besides IMPs are cytoplasmic proteins, among 
them the chaperone DnaK (Extended Data Fig. 1g). DnaK functions in 
protein translocation”, is partially membrane associated’”, and cooper- 
ates with a membrane-bound J-protein cochaperone'', suggesting that 
nascent Dnak is partially targeted but not inserted into the membrane. 
SRP binding to other cytoplasmic proteins may reflect the inherent 
error rate in SRP substrate recognition. Their cytoplasmic localization 
implies that these RNCs are rejected during subsequent steps of 
targeting’*. We do not detect SRP binding to nascent SecA or 0°, which 
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was suggested to promote SecA folding’ and the sensing of proteotoxic 
stress by 0°? (ref. 14). 

A proteome-wide metagene analysis of translatome and SRP inter- 
actome data shows that, on average, SRP binds RNCs when nascent 
chains are 50-100 residues long (Fig. 1c). Our results do not support 
earlier findings that SRP non-discriminately binds all ribosomes early 
during translation, before the N terminus emerges’». To determine the 
positioning of TMDs facilitating initial SRP engagement, we prepared 
a ratio metagene interaction profile of all IMPs aligned to the N 
terminus of the first predicted TMD"* (Fig. 1d). SRP on average binds 
an emerging substrate when the N terminus of the first TMD reaches 
a distance of 42 amino acids from the ribosomal peptidyl transferase 
centre (PTC); maximal binding is reached at a TMD distance of 55 
amino acids from the PTC. Assuming that the 25 C-terminal residues 
are protected within the ribosome and that SRP senses the nascent 
chain already in the proximal part of the tunnel, the emergence of about 
17 residues of the TMD suffices for SRP recruitment. 

Analysing SRP engagement with individual nascent IMPs, we find 
substrates with remarkably different SRP-binding patterns (Fig. le and 
Extended Data Fig. 4). Some nascent chains are bound when the TMD 
is distant from the ribosome (Extended Data Fig. 4), while others are 
bound already upon partial TMD exposure. Another subset exposes 
two nearby TMDs, suggesting that SRP may bind small segments of 
one TMD ora composite stretch encompassing neighbouring parts of 
both TMDs. Finally, we detect some highly sequence-divergent nascent 
chains that trigger SRP binding before the N-terminal TMD emerges, 
generally followed by stronger binding once the TMD is fully exposed 
(Extended Data Figs 4 and 5). The molecular mechanism triggering 
this early SRP binding remains unclear. 

We used our position-resolved information to explore whether SRP 
binding to the nascent proteome may coincide with a change of average 
translation speed, as suggested recently'”. However, the metagene profile 
of the translatome of 431 SRP substrates aligned to the position of 
initial SRP binding does not reveal an appreciable peak in the trans- 
latome, which would indicate a translation slowdown (Fig. 2a). Thus, 
considering the translatome as a whole, fidelity and specificity of 
SRP-mediated targeting in E. coli are not generally controlled by 
variation in translation speed. 

We noticed some unexpected SRP binding features to nascent 
IMPs. First, in 29% of all substrates identified, SRP fails to bind the 
first TMD but instead binds a more C-terminal TMD. Examples are 
nascent MsbA, UraA and Met! that bind SRP only after emergence of 
the second TMD (Fig. 2b and Extended Data Fig. 6a, b). This feature 
of delayed targeting is independent of the final orientation (Nin—Cout or 
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Figure 1 | The SRP interactome. a, Gene 
expression levels of the translatome and SRP 
interactome are compared in reads per million 
(RPM). Open reading frames (ORFs; 2,367) are 
coloured according to localization (n= 2). 

b, Comparison of SRP substrates identified 

by total enrichment (TE) or peak detection 
(PD). IMPs, inner membrane proteins; LPs, 
lipoproteins; OMPs, outer membrane proteins; 
PPs, periplasmic proteins. c, Metagene 

read density for translatome and SRP interactome 
data sets aligned to the start codon. AU, 
arbitrary units. d, Metagene SRP interaction 
profile aligned to the N terminus of the initial 
TMD. e, Heatmap of TMD positions upon initial 
SRP binding. 
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Nout-Cin) of the skipped TMD in the membrane, and is also observed 
in detergent-free SeRP (Extended Data Fig. 2c, d). TMD skipping is 
not generally due to false topology predictions of IMPs, since for some 
IMPs (MsbA, UraA, Metl) crystal structures are available that indicate 
that the skipped TMDs are membrane embedded. To explore whether 
skipping is based on intrinsic features of the TMD or its position in 
the nascent chain, we studied SRP binding to a mutated N-terminal 
fragment of MsbA encoding both TMDs in switched order and fused 
to enhanced yellow fluorescent protein (eYFP; MsbA*-eYFP; Fig. 2c). 
SRP bound MsbA*-eYFP upon emergence of the now N-terminal 
TMD2, demonstrating that binding is controlled by sequence, but not 
position of the TMDs. We also analysed SRP binding to purified RNCs 
that expose TMD1 or TMD2 of MsbA or the DsbA signal sequence on 
the ribosome surface. Supporting SeRP data, salt-resistant SRP binding 
to ribosomes is only conferred by TMD2, but not TMD1, of MsbA, 
or the signal sequence of DsbA (Fig. 2d and Extended Data Fig. 7). 
A second unexpected feature is that 77% of all IMP substrates are bound 
multiple times during synthesis, with binding peaks generally correlat- 
ing with an emerging TMD (Fig. 2b, e and Extended Data Fig. 6c, d). 

Both features do not agree with the currently preferred model of 
co-translational protein targeting, which assumes that SRP engages 
only the most N-terminal TMD!. To account for our observations, 
we considered the possibility that SRP engages RNCs only once, but 
because SRP levels in vivo are low and perhaps limiting, N-terminal 
TMDs with low hydrophobicity may sometimes be missed. The delayed 
SRP binding to more C-terminal TMDs of multi-spanning IMPs will 
generate multiple SRP-binding peaks in our ensemble measurements. 
If this model were true, SRP overexpression should facilitate early SRP 
engagement to the first TMD and reduce binding of SRP to internal 
TMDs. However, even the overproduction of SRP and FtsY from plas- 
mid did not affect the abundance and amplitude of SRP interaction 
peaks in nascent IMPs (Extended Data Fig. 6d). We therefore propose 
that some RNCs may occasionally detach from the translocon and 
require re-targeting by SRP upon emergence of a downstream TMD. 
This model agrees with the observations of SRP-dependent re-initiation 
of translocation in eukaryotes’* and that length and local hydrophobic- 
ity of nascent chains affect the stability of RNC-translocon complexes 
upon purification from E. coli’. 

Mapping of SRP-binding sites allows the identification of nascent 
chain determinants that mediate binding. We first compared the 
computed average Gibbs free energy difference of membrane inser- 
tion (AG,p,)”° and the Kyte-Doolittle hydrophobicity of N-terminal 
TMDs that are SRP-skipped or SRP-bound, and of signal sequences. 
Consistent with earlier work*', SRP-bound TMDs have higher average 
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Figure 2 | SRP-RNC interaction. 
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hydrophobicity and lower AG,, than SRP-skipped TMDs (Fig. 3a, b). 
Signal sequences have the highest AG,pp and lowest hydrophobicity, in 
particular at their positively charged N termini. Second, we performed 
a position-resolved analysis of sequence logos” to reveal the consensus 
SRP-binding motif and its distance from the ribosome surface, and com- 
pared it with logos for skipped TMDs and signal sequences (Fig. 3c). 
SRP preferentially binds ribosomes that expose a 12—17-amino- 
acid-long stretch enriched in hydrophobic residues (Leu, Val, Ile, Phe), 


starting at a distance of 27 amino acids from the PTC, while skipped 
TMDs and signal peptides are less hydrophobic. Third, we determined 
the sequence features of the ribosome-exposed part of the TMDs upon 
SRP binding (Extended Data Fig. 8 and Supplementary Table 2). SRP 
prefers binding sites enriched for aliphatic (Met, Leu, Val, Ile) and bulky 
aromatic (Phe, Trp, Tyr) residues, and a lower content of helix- breaking 
Pro and Gly residues. Fifty-one per cent of all SRP-bound TMDs have 
a consecutive stretch of at least four Phe, Ile, Leu or Val residues in 
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Figure 3 | SRP-nascent-chain-binding properties. a, Analysis of 
computed average free energy differences for membrane insertion, AGapp. 
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initial SRP binding, aligned to the C terminus (left; the grey area indicates 
the ribosomal tunnel), of skipped TMDs aligned to their N terminus 
(centre), and of signal sequences aligned to second amino acid from the 
N terminus (right). 
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Figure 4 | SRP and TF specificity. a, Gene expression levels in translatome 
and TF interactome data (n = 2). b, Metagene interaction profiles of 

SRP and TE. c, Left, TF and SRP interaction profiles with ManZ. Right, 
SRP interaction profile in cells lacking TF (Atig). Light grey shading 
indicates the variation between two biological replicates. aa, amino acids; 


arbitrary order, while only 14% of skipped TMDs and 8% of signal 
sequences have such a stretch. In agreement with this, the SRP-bound 
TMD2 of MsbA, but not TMD1, contains a four-residue-long hydro- 
phobic stretch (LVVI) and a lower AGapp (+1.05 kcal mol! (TMD1), 
—1.02 kcal mol~! (TMD2)). In addition to the strong N-terminal 
enrichment of Lys and their reduced hydrophobicity, signal sequences 
have fewer aromatic residues and helix breakers than TMDs. 

The chaperone TF has been suggested to enhance substrate specificity 
of SRP in vivo and in vitro™*4, implying an important contribution 
of TF to SRP function. Some studies indicate that SRP and TF com- 
pete for ribosome or substrate binding”, while other studies suggest 
that they co-exist on ribosomes”’. To resolve these inconsistencies, we 
determined the functional importance of TF for SRP substrate binding 
in vivo. SeRP data for both factors show very limited substrate overlap 
(Figs 1a and 4a). TF avoids binding IMPs and preferentially engages 
cytoplasmic, periplasmic and outer membrane proteins, and metagene 
analyses show that SRP and TF engage the bulk of nascent substrates at 
different time points during translation (Fig. 4b). For the few substrates 
engaged by both factors, we find strict temporal separation of RNC 
binding. Owing to the N-terminal position of most TMDs and the 
generally delayed binding of TF to nascent chains’, SRP mostly binds 
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WT, wild type. d, Box plot representation of enrichment efficiencies in 
SRP interactome data sets for proteins of different localizations (cytosol, 
periplasm, outer membrane, lipoproteins and IMPs) in wild-type 

cells, cells lacking TF (Atig), cells overexpressing TF (TF{) and cells 
overexpressing SRP and FtsY (SRP}+FtsYT). 


before TF (for example, MrcA; Extended Data Fig. 9). One exception is 
nascent ManZ, which is bound first by TE. The reason for this unusual 
binding pattern is that SRP skips the three N-terminal TMDs of ManZ 
and only engages the fourth TMD positioned far from the N terminus 
(amino acids 147-167; Supplementary Table 1 and Fig. 4c). We also 
analysed the SRP interactome in cells lacking or overexpressing the tig 
gene encoding TF (Fig. 4d). Contrasting the proposed function of TF 
as a factor improving SRP specificity”, we find that the overexpression 
of tig does not affect SRP binding, and rather that tig deletion enhances 
SRP binding to IMPs and decreases binding to cytosolic proteins. 
Accordingly, the onset of initial SRP binding in ManZ is not changed 
in the absence of TF (Fig. 4c). SRP binding specificity was relaxed only 
by transient overexpression of SRP and its receptor FtsY, which trig- 
gered more prominent SRP binding to cytoplasmic, periplasmic and 
outer membrane proteins (Fig. 4d). 

Together, our analyses show that SRP, owing to its strong preference 
for TMDs, acts as the dominant and chaperone-independent router 
that specifically triages IMPs into the co-translational translocation 
pathway while periplasmic proteins and outer membrane proteins 
are exported post-translationally. We speculate that cells may employ 
the co-translational pathway specifically for IMPs because its slower 
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translocation kinetics*®”’ alleviates critical steps of membrane inser- 


tion, for example, by facilitating helix formation or diffusion of TMDs 
into the lipid bilayer. In contrast, the faster SecA-dependent post- 
translational translocation is compatible with folding of periplasmic 
and outer membrane proteins, and may help to reduce the number of 
translocons required per cell. 

Our data suggest a revised model of co-translational protein trans- 
location, which often initiates on internal TMDs. We speculate that 
substrates containing N-terminal skipped TMDs may enter the trans- 
locon pore by either inserting a loop or additionally engaging SecA 
for translocating the periplasmic parts of the protein. This model is 
supported by previous studies suggesting that SecA is involved in 
co-translational translocation of some IMPs”*. 

TF is not involved in substrate selection and acts on different 
substrate classes and at different time points during translation than 
SRP, contradicting previous reports”*~*, This distinguishes TF from the 
nascent-polypeptide-associated complex NAC of eukaryotes, which, 
unlike TF, acts as an antagonist of SRP to sharpen SRP specificity in 
Caenorhabditis elegans***°. We conclude that bacterial SRP suffices to 
triage IMPs to the co-translational pathway. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Purification of SRP-RNCs for SeRP. E. coli cells (MC4100; ref. 31) were grown in 
200 ml EZ Rich Defined Medium (EZ-RDM, Teknova), a rich supplemented MOPS 
defined media at 37 °C to an OD ¢o0nm of 0.45. For growth of strains encoding avi- 
tagged TE, 100j1g/ml ampicillin and 40 j1g/ml p-biotin was added. For transient 
overexpression of tig, cells transformed with pTrc-TF* were grown in the presence 
of 250,1M IPTG for one duplication period. For SRP/FtsY overexpression, cells 
were transformed with the plasmid pHQ4 (ref. 21), facilitating the overexpression 
of ffs, ffh and ftsY. Growth, harvest and lysis of cells for SeRP were essentially done 
as described’. Briefly, cells were harvested by rapid filtration (pore size 0.2 1m) 
and frozen in liquid nitrogen. Lysis buffer (50 mM HEPES-KOH 7.5, 100 mM 
NaCl, 10mM MgCh, 5mM CaCl, 1mM chloramphenicol, 1 mM PMSF, 0.4% 
Triton-X100, 0.1% NP-40, 50 mM octyl-8-p-glucopyranoside) was frozen in liquid 
nitrogen. Frozen cells were mixed with 500 1l frozen lysis buffer supplemented 
with 1.311 DNasel and lysed by mixer milling (Retsch MM400, 10 ml jars, 2 min 
30 Hz). To the pulverized cells 500 11 cold lysis buffer was added. The RNA con- 
centration was determined and polysomes were digested using MNase (150 U/1 
A260nm) for 5min at 25°C. The reaction was terminated by addition of 6mM EGTA 
and chilling on ice. Cell debris was removed by 5 min centrifugation (3,000 r.p.m., 
4°C). Monosomes were purified by sucrose cushion centrifugation (lysis buffer 
lacking CaCl, and supplemented with 30% sucrose) for 90 min, 75,000 r.p.m., at 
4°C (Beckmann AT2 $120 rotor). Pelleted ribosomes were washed once and resus- 
pended in lysis buffer lacking CaCl. Detergent-free SeRP was performed in the 
absence of Triton X-100, NP-40 and octyl 3-p-glucopyranoside. 

Selective purification of factor bound RNCs. Immunoprecipitation of SRP- 
RNCs. Per 200 ml filtered cells, 2.5 ml Dynabeads (Life Technologies) were used. 
Beads were washed three times with 5 ml PBS and incubated with 100 tl rabbit 
anti-SRP antibody generated in our laboratory for 10 min at room temperature 
under constant shaking. The beads were washed three times with 5 ml buffer 
(TBS, 1 mM chloramphenicol, 10 mM MgCh, 0.1% Tween-20). Monosomes were 
incubated with the affinity beads for 15 min at 4°C under constant shaking. The 
matrix was quickly washed three times with cold wash buffer and RNA was 
extracted by phenol-chloroform extraction. For detergent-free SeRP, Tween-20 
was omitted. 

Affinity purification of TF-RNCs. To equilibrate the Strep-Tactin sepharose, a 50% 
slurry was washed twice with 700 il lysis buffer (50 mM HEPES-KOH 7.5, 100 mM 
NaCl, 10mM MgCh, 1 mM chloramphenicol, 1mM PMSF, 0.4% Triton-X, 0.1% 
NP-40, 50mM octyl 8-p-glucopyranoside) for 5 min at 4°C. 2.5 ml of matrix was 
used per 200 ml of cells. Monosomes were prepared as described before, omitting 
the chemical crosslinking reaction and incubated with the affinity matrix for 1h 
at 4°C under constant shaking. The matrix was washed three times with cold lysis 
buffer (without CaCl). For TEV cleavage, the matrix was incubated for 1h in 
cleavage buffer (50 mM Tris-HCl pH 7.0, 200 mM NaCl, 10mM MgCl, 1 mM chlo- 
ramphenicol). Elution was performed in three consecutive steps. In each step 100,11 
of cleavage buffer supplemented with nucleic acid-free TEV protease was added 
and incubated for 30 min at room temperature. The RNA of the pooled elution 
fractions was extracted by phenol-chloroform extraction. Detergent-free SeRP 
was performed in the absence of Triton-X, NP-40 and octyl 8-p-glucopyranoside. 
Deep sequencing library. Ribosomal footprints were isolated and prepared for 
deep sequencing as previously described*?. 

Data analysis. No statistical methods were used to predetermine sample size. The 
experiments were not randomized. The investigators were not blinded to allocation 
during experiments and outcome assessment. Sequencing reads were processed 
as described previously**. Further analyses were done using customized python 
scripts. Additional information on the code will be provided upon request. 

Peak detection analysis. SRP substrate identification was based on two methods, 
first on the ratio-based enrichment passing a threshold of 2.0 and second via peak 
detection. For peak detection the ratio over a window of 11 nucleotides (nt) of 
RPM-normalized interactome and translatome data of two replicates was built. 
Reproducibility of SRP interaction profiles was evaluated in the quality control 
step by Pearson correlation analysis. If a threshold of 0.6 was passed, genes were 
processed further. A continuous wavelet transformation was used to detect SRP- 
binding peaks (min_peak_width=6 nt, max_peak_width= 90 nt, min_length=9, 
gap_thresh = 50). SRP-binding peaks needed to pass a threshold of 5.0 in two 
replicates. 

TMD prediction. To predict transmembrane domains in proteins, we used the AG 
prediction server, http://dgpred.cbr.su.se (ref. 20). The advantage of this method 
is that is uses a biological membrane insertion scale that has been developed from 
experimental data**. Each hydrophobic segment receives a score that is 3 or less; 
this score represents the insertion free energy difference (in kcal/mol) between the 
translocon channel and the bilayer. Each hydrophobic segment predicted by this 
server was seen as a potential transmembrane domain in the analysis done later 


with this data. The server script was run with default server settings (Helix min 
length: 19; Helix max length: 23; Length correction: ON). 
Identification of peptide properties. The identification of peptide properties asso- 
ciated with SRP binding was performed as follows. SRP-bound TMDs were 
compared to those TMDs skipped by SRP using a reference pool of typical SRP 
substrates. The number of residues contributing to a TMD in the recognition range 
between residues 26-50 was evaluated and only those 299 substrate peptides with 
a TMD exposure of at least 17 residues length were kept for further analysis. The 
TMD region and the additional C- or N-terminal flanking regions within the 
recognition site were analysed separately. The average number of residues in a 
TMD and in the flanking regions for the pool of 299 substrates was determined, 
thereby resulting in average lengths of 17 residues for a TMD, 3 residues for a 
C-terminal flanking region and 4 residues for an N-terminal flanking region. For 
each skipped TMD, a C-terminal flanking region of 3 residues and a TMD of 17 
residues (counting from the C terminus, 77 sequences in total) were collected for 
comparison with bound TMD and C-terminal flanking regions, as well as a TMD 
of 17 residues (counting from the N-terminal end, 87 sequences in total) and an 
N-terminal flanking region of 4 residues for comparison with bound TMD and 
N-terminal flanking regions. If the flanking regions were found to overlap with 
the following or previous TMDs, the TMD with its flanking region was skipped. 
Peptide properties were computed by analysing amino acid content. Types of 
residues were collected from amino acid counts, combining D and E as acidic, I, 
L, Mand V as aliphatic, F, W and Y as aromatic, H, K and R as basic, G and P as 
helix breaking and C, N, Q, S and T as polar. Average Kyte—-Doolittle values, AG 
values, hydrophobic moments, isoelectric points, volumes, «-helical-, 3-strand- 
and turn propensities were collected as a sum over the reference values for all 
contributing residues divided by the total number of residues in the analysed 
sequence. The percentage of sequences containing residues contributing to a con- 
secutive hydrophobic stretch was also determined. In addition, signal peptides, 
which have a length of 23 residues on average, were analysed in a similar fashion 
and percentages of values with respect to the sequence length were computed. 
Cloning of MsbA swap plasmid. A silently mutated DNA fragment encoding 
the N-terminal part of MsbA (bp 1-303) with swapped TMD1 and TMD2 was 
synthesized, fused with the 5’-end of eyfp and cloned in pTrc99B by In-Fusion 
cloning (Clonetech Laboratories). 
Purification of RNCs for in vitro binding studies. Plasmids encoding N-terminal 
fragments of DsbA and MsbA fused to the N terminus of the minimal SecM stalling 
sequence were constructed using standard protocols and SecM encoding plasmids 
described previously*’. Stalled RNCs contained N-terminal Strep-tagged Sumo 
fusion proteins. RNCs were generated in cells lacking trigger factor and purified 
by Strep-tag affinity purification followed by removal of the N-terminal Strep- 
Sumo tag using the protease fragment of the Sumo protease Ulp1 as described*. 
The nascent chain sequences remaining after purification are (signal sequence 
(SS) and TMD in bold): MsbA(TMD1)-SecM, MHNDKDLSTWQTFRRLWP 
TIAPFKAGLIVAGVALILNAASDTFMLSLLKPLLFSTPV WISQAQGIRAGP; Ms 
bA(TMD2)-SecM, MDGFGKTDRSVLVWMPLVVIGLMILRGITSY 
VSSYCISWVSESTPV WISQAQGIRAGP; DsbA(SS)-SecM, MKKIWLALAGLV 
LAFSASAAQYEDGKQYTTLEFSTPVWISQAQGIRAGP. 
RNC in vitro binding studies. Purified SRP (1.5,1M) and RNCs (0.5 }1M) were 
mixed and incubated for 15 min in low salt buffer (50 mM HEPES-KOH pH 7.0, 
100mM KOAc, 12mM MgOAc, 1mM DTT and Roche complete protease inhibi- 
tor). The salt concentration was adjusted and samples were loaded on 30% sucrose 
cushions prepared with salt-adjusted binding buffers. RNCs were sedimented by 
centrifugation for 75 min, 75,000 r.p.m., at 4°C (Beckmann AT2 $100 rotor). Sixty 
microlitres of the supernatant was mixed with SDS sample buffer, the remaining 
supernatant was discarded and the pelleted ribosomes were resuspended. Samples 
were analysed by SDS-PAGE (4-20% gradient gels) followed by staining using 
SYPRO Ruby Protein Gel Stain (Thermo Fisher Scientific). 


31. Casadaban, M. J. Transposition and fusion of the /ac genes to selected 
promoters in Escherichia coli using bacteriophage lambda and Mu. J. Mol. Biol. 
104, 541-555 (1976). 
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(2004). 
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2212-2239 (2013). 
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35. Rutkowska, A. et al. Large-scale purification of ribosome-nascent chain 
complexes for biochemical and structural studies. FEBS Lett. 583, 2407-2413 
(2009). 
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Extended Data Figure 1 | See next page for caption. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Extended Data Figure 1 | Selective ribosome profiling of E. coli SRP. 

a, Experimental scheme of selective ribosome profiling (SeRP) of 

E. coli SRP-bound RNCs. Cells were harvested in mid-log phase via 
rapid filtration, frozen in liquid nitrogen and lysed in a frozen state with 
acryo mill. After thawing, polysomes were digested with micrococcal 
nuclease. Monosomes were purified by sucrose cushion centrifugation 
(translatome). SRP-bound RNCs were immunopurified using an SRP- 
specific polyclonal rabbit antibody (SRP interactome). b, Bioanalyzer 
spectra quantifying the amount of co-purified ribosomes in control 
immunoprecipitation (top) and SRP immunoprecipitation (bottom). The 
16S ribosomal RNA (rRNA) of the small ribosomal subunit and the 23S 
rRNA of the large subunit are indicated. c, Reproducibility of translatome 


(left) and SRP interactome (right) data sets from biological replicates 

d, Gene expression levels of translatome and SRP interactome are 
compared. Only SRP substrates that pass a threshold of twofold 
enrichment are coloured according to localization (cytoplasm in blue, 
inner membrane in red, outer membrane, lipoproteins and periplasm in 
green, no localization known in grey). e, CopA ratio-enrichment profile 
of SRP interactome and translatome, Pearson correlation coefficient 
0.74. Light grey shadows indicate the variation between two biological 
replicates. f, DsbA ratio-enrichment profile, Pearson correlation 
coefficient 0.67. Shadows as in e. g, DnaK ratio-enrichment profile of SRP 
interactome and translatome. Shadows as in e. 
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Extended Data Figure 2 | Selective ribosome profiling of E. coli SRP 
omitting detergents. a, Gene expression levels of the translatome and 

SRP interactome are compared for different experimental setups (with 
detergents in lysis and wash buffer (n = 2), detergent only in wash 

buffer (n = 2) and omitting detergents at all (n = 1)). ORFs are coloured 
according to localization (cytoplasm in blue, inner membrane in red, outer 


membrane, lipoproteins and periplasm in green, no localization known 
in grey). b, DsbA ratio-enrichment profile in the absence of detergents 
(n= 1). ¢, Metagene SRP interaction profile aligned to the N terminus of 
the initial TMD that is skipped in the presence of detergents (orange) and 
in the absence of detergents (black). d, Ratio-enrichment profiles in the 
absence of detergents of MetI, MsbA and ManZ (n= 1). 
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Extended Data Figure 3 | Interaction profiles of SRP with nascent YnhF, 
YohO, YbgT, MgrB and YbhT. Light grey shading indicates the variation 
between two biological replicates. 
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Extended Data Figure 4 | Heatmap representation of TMD positioning 
of inner membrane SRP substrates at the time point of first SRP 
binding. TMDs are shown in dark red and segments (loops) located 
outside the membrane bilayer are shown in light grey, dashed lines indicate 
the area of the ribosomal tunnel exit. Substrates that are bound by SRP 
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without exposing a TMD near the ribosome surface are shown at amino 
acid resolution. Amino acid colour code: highly hydrophobic amino acids, 
black; medium hydrophobic amino acids, dark grey; low hydrophobic 
amino acids, light grey; basic amino acids, blue; acidic amino acids, red; 
helix breakers, green. 
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Extended Data Figure 6 | SRP interaction profiles. a—c, SRP interaction 
profile with nascent UraA (a), Metl (b) and SecY (c). SRP-binding peaks 
are correlated with protein topology. Light grey shading indicates variation 
between two biological replicates. d, SRP interaction profile with nascent 
CyoA in wild-type (WT) cells and cells overexpressing SRP and FtsY. 
Shading as in a-c. 
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Cotranslational signal-independent SRP preloading 
during membrane targeting 


Justin W. Chartron!, Katherine C. L. Hunt! & Judith Frydman!? 


Ribosome-associated factors must properly decode the limited 
information available in nascent polypeptides to direct them to 
their correct cellular fate’. It is unclear how the low complexity 
information exposed by the nascent chain suffices for accurate 
recognition by the many factors competing for the limited surface 
near the ribosomal exit site. Questions remain even for the well- 
studied cotranslational targeting cycle to the endoplasmic reticulum, 
involving recognition of linear hydrophobic signal sequences or 
transmembrane domains by the signal recognition particle (SRP)*°. 
Notably, the SRP has low abundance relative to the large number 
of ribosome-nascent-chain complexes (RNCs), yet it accurately 
selects those destined for the endoplasmic reticulum®. Despite 
their overlapping specificities, the SRP and the cotranslationally 
acting Hsp70 display precise mutually exclusive selectivity in vivo 
for their cognate RNCs”*. To understand cotranslational nascent 
chain recognition in vivo, here we investigate the cotranslational 
membrane-targeting cycle using ribosome profiling” in yeast cells 
coupled with biochemical fractionation of ribosome populations. 
We show that the SRP preferentially binds secretory RNCs before 
their targeting signals are translated. Non-coding mRNA elements 
can promote this signal-independent pre-recruitment of SRP. Our 
study defines the complex kinetic interaction between elongation 
in the cytosol and determinants in the polypeptide and mRNA that 
modulate SRP-substrate selection and membrane targeting. 

Secretory proteins are proposed to target to the endoplasmic 
reticulum (ER) membrane either co- or post-translationally for subse- 
quent translocation'”"!”. Mechanistic models of ER targeting and the 
role of the SRP derive primarily from cell-free systems using model 
proteins'!°, raising the question of how these pathways function in the 
cell. To investigate membrane targeting in vivo, we fractionated soluble 
and membrane-attached ribosomes from yeast cells, and then used ribo- 
some profiling (termed Ribo-seq)’ to compare the ribosome-protected 
mRNA footprints from polysomes obtained from both fractions 
(Extended Data Fig. la). We derived a cotranslational membrane 
enrichment score for each coding sequence (Methods, Extended Data 
Fig. 1b and Supplementary Table 1). Transcripts encoding cytosolic 
or nuclear (cytonuclear) proteins were preferentially translated on 
cytosolic ribosomes and not enriched on membrane polysomes 
(Fig. la). Tail-anchored proteins, whose single or transmembrane 
domain (TMD) at the carboxyl terminus is only revealed 
posttranslationally'*, were also translated on cytosolic ribosomes. By 
contrast, many nuclear-encoded mitochondrial protein transcripts were 
enriched in the membrane-bound ribosome fraction, as expected! 
Transcripts encoding ER-destined secretory proteins were highly 
enriched on membrane-bound ribosomes. Proteins containing a signal 
sequence (SS) or TMD had comparable cotranslational membrane 
enrichment, conflicting with the idea that the targeting signal itself 
distinguishes which proteins are targeted co- or post-translationally 
to the ER!!? (Fig. 1a). 

Ribosome profiling provides a snapshot of the abundance of 
ribosomes at each codon of each mRNA’, revealing the dynamics 


of translation on soluble versus membrane-bound ribosomes. For 
cytonuclear proteins, soluble ribosome-protected reads were 
distributed across the entire reading frame, consistent with complete 
translation in the cytosol (Extended Data Fig. 1c). For secretory 
proteins, both soluble and membrane-bound polysomes produced 
protected reads. Cytosolic translation represented only a small fraction 
of any given secretory transcript, and most of the secretory mRNA pool 
was membrane anchored. In the classical understanding of cotransla- 
tional targeting, secretory protein RNCs bind to the membrane only 
after exposing a targeting signal’. Thus, there should be fewer RNCs 
found on the membrane translating the portion of transcripts not yet 
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Figure 1 | Cotranslational membrane enrichment. a, Distributions of 
the open reading frame (ORF) enrichment of ribosome-protected reads 
in the membrane fraction compared to the soluble fraction. ORFs were 
alternatively classified by expected SRP dependence"’. Values are the mean 
from two biological replicates. **P < 0.01, Wilcoxon rank-sum test. 

b, Ribosome-protected reads at each codon of an example transmembrane 
protein OLE1. Membrane topology is indicated above, with the first TMD 
in lavender. c, Metagene analysis of soluble fraction polysome-protected 
reads from transcripts that were at least twofold membrane enriched. 
ORFs were aligned at the targeting signal and scaled. d, Cotranslational 
membrane targeting is in competition with elongation. e, Elongation 
inhibitors provide additional time for polysomes exposing a targeting 
signal to localize to the membrane. f, Membrane enrichment was limited 
by the length of the reading frame remaining after the encoding of 
targeting signals. The vertical dashed line indicates 50 codons. 
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targeted, that is, at codon positions upstream of the first SS or TMD. 
However, the membrane-bound ribosome-protected reads were evenly 
distributed across the entire transcript (Fig. 1b and Extended Data 
Fig. 1c, d). This suggests that once targeted, secretory mRNAs remain 
associated to the ER and their translation initiates at the membrane. 
This is consistent with the observed proximity of secretory RNCs to the 
translocon before synthesis of the targeting signal'®. The small fraction 
of secretory mRNA in cytoplasmic pre-targeted RNCs probably 
represents the pioneer round of targeting. 

The positioning of soluble ribosomes along mRNA provides 
insight into how secretory transcripts are targeted to the membrane. 
The highest read density for these messages mapped 5’ of the region 
encoding the first SS or TMD; read density declined after the first 
targeting signal was exposed by the ribosome, as expected from 
cotranslational signal-dependent targeting of soluble RNCs to the 
membrane (Fig. 1b, c and Extended Data Fig. 1d). Surprisingly, the 
loss of reads after signal emergence was gradual, resulting in many 
RNGCs that remained soluble for hundreds of residues after SS or TMD 
exposure. This result was inconsistent with the elongation attenua- 
tion activity proposed for the SRP!”!8 and suggests that elongation 
continues on cytosolic RNCs upon exposure of a targeting signal 
(see Supplementary Discussion). 

The idea that there is a kinetic competition between continuing 
elongation in the cytosol and RNC targeting to the membrane makes 
two testable predictions (Fig. 1d). First, pharmacological inhibition of 
elongation with cycloheximide (CHX) should decouple these processes, 
enhancing targeting of translocation-competent RNCs and promoting 
their depletion from the soluble fraction (Fig. le). Cells were subjected 
to a brief, two-minute CHX incubation before Ribo-seq analysis of 
soluble and membrane-bound polysomes. Importantly, such brief 
incubation did not perturb non-secretory polysomes (Extended Data 
Fig. 1c). By contrast, CHX treatment markedly reduced the soluble 
secretory reads, but only after cytosolic RNCs exposed the first SS or 
TMD, that is, 40 codons after its synthesis (Fig. 1b, c and Extended 
Data Fig. 1d). 

The kinetic competition between targeting and elongation predicts 
that cotranslational membrane attachment is influenced by translation 
termination. In the absence of an elongation arrest, the probability of 
RNCs reaching the membrane cotranslationally will decrease as the first 
SS or TMD is found closer to the C terminus (Fig. 1fand Extended Data 
Fig. le). Indeed, we observed a decline in the maximum membrane 
enrichment of secretory RNCs when the first targeting signal is near the 
C terminus. Thus, secretory proteins with a late targeting signal, SS or 
TMD, must be targeted to the ER posttranslationally (Supplementary 
Discussion). Overall, our data suggest that cotranslational targeting to 
the ER in yeast is accomplished via a pioneer round of translation on 
soluble ribosomes that establishes a pool of ER-residing mRNA that 
initiate translation at the membrane (Extended Data Fig. 1f). 

We next determined which RNCs are substrates of the SRP 
in vivo. Immunoprecipitation of Srp72p from total soluble RNCs was 
followed by ribosome profiling of both SRP-associated polysomes 
and monosomes (Fig. 2a and Extended Data Fig. 2a). Few transcripts 
encoding cytonuclear or mitochondrial proteins were enriched on SRP, 
confirming its specificity towards ER-destined transcripts. Notably, the 
SRP bound to all secretory RNCs that were cotranslationally targeted 
to the membrane, including SRP-dependent and SRP-independent 
proteins (Fig. 2b, c). 

The number of ribosome-protected reads from soluble, SRP- 
bound transcripts diminished after ribosome exposure of the first SS 
or TMD, as expected from its targeting function (Fig. 2d). The loss 
was gradual and many SRP-RNCs remained soluble well after the 
targeting signal became fully exposed to the cytosol. This supports 
the notion that elongation proceeds on cytosolic ribosomes even after 
SRP binds, in contrast with the expected SRP-induced elongation 
arrest. Indeed, blocking elongation with CHX for 2 min before lysis 
caused a marked depletion in SRP-bound reads, but only for RNCs 
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Figure 2 | Cotranslational enrichment of SRP. a, Srp72p-TAP was 
immunoprecipitated from the total soluble fraction. SRP-bound 
monosomes and polysomes were separated by sucrose gradient 
ultracentrifugation. b, Distributions of the ORF enrichment of ribosome- 
protected reads from SRP-bound soluble polysomes over the total 
soluble polysomes. ORFs were alternatively classified by expected SRP 
dependence’. Values are the mean from two biological replicates. 

TA, tail-anchored. *P < 0.05, Wilcoxon rank-sum test. c, Cotranslational 
membrane-fraction enrichment compared to SRP enrichment. 

d, Metagene analysis of soluble SRP-bound polysome-protected reads 
from transcripts that are at least twofold SRP-enriched. ORFs were 
aligned at the targeting signal and scaled. 


exposing their first targeting signal (Fig. 2d). In principle, the delayed 
targeting of soluble RNCs to the membrane after SS/TMD emergence 
could reflect a delay in SRP binding rather than a lack of elongation 
arrest. Comparing the SRP and membrane enrichment to transcripts 
indicated that this is not the case. RNCs encoding late targeting signals, 
that is, near the C terminus, still bound SRP but did not target to the 
ER membrane (Supplementary Discussion, Fig. 2c and Extended Data 
Fig. 2b-d). Addition of CHX allowed these late-signal RNCs to enrich 
at the membrane, indicating the SRP-RNC complexes are competent 
for ER-targeting. We conclude that the SRP binds the nascent chain 
quickly, and continued elongation causes termination of late signals 
before targeting. 

Although elongation arrest is not a general consequence of SRP 
binding in vivo, recent work showed that a rare-codon-directed 
slowdown of elongation facilitates SRP binding’’. An intrinsic, non-SRP- 
dependent elongation slowdown should increase ribosome-protected 
reads at the same codon in both soluble SRP-bound and membrane- 
bound polysomes. Indeed, several transcripts presented such local 
increases in ribosome-protected reads at sites corresponding to exposure 
of a targeting signal on the ribosome (Extended Data Fig. 3a-c). 
Distinct elongation attenuation mechanisms observed at these sites 
included clusters of rare codons’? and stalling polypeptide elements, 
such as stretches of positively charged amino acids, or proline motifs, 
positioned within the exit tunnel?”?!. While most secretory transcripts 
were not significantly enriched in these attenuator elements compared to 
the proteome (Extended Data Fig. 3d, e), the few non-secretory proteins 
that cotranslationally bound to the SRP were enriched in elongation 
attenuation elements positioned at sites that exposed a near-cognate 
hydrophobic sequence for SRP binding (Extended Data Fig. 3d, f). We 
speculate that the presence of such elements enhances SRP recognition 
of the near-cognate hydrophobic tracts in these non-secretory proteins. 

To understand the basis for the specificity of the SRP in vivo, we 
next determined the initial point of SRP recruitment to ribosomes 
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translating secretory proteins. Because polysomes require only a 
single SRP-bound ribosome to co-purify with Srp72p, additional 
strategies were necessary to identify mRNA footprints that originated 
from a single SRP-bound ribosome. We developed a protocol using 
in vivo monosomes to identify the initial SRP binding event on RNCs 
(Fig. 2a). At any given time, a fraction of transcripts contains only a 
single actively translating ribosome (Extended Data Fig. 4a). Total 
soluble monosomes yield a similar distribution of protected reads 
compared to polysomes (Extended Data Fig. 4b-e and Supplementary 
Discussion). We separated soluble SRP-bound monosomes from SRP- 
bound polysomes and subjected both fractions to Ribo-seq analysis 
(Extended Data Fig. 5a, b). Of note, the monosomes were necessarily 
bound to the SRP during the purification, and thus should reveal which 
codons are responsible for the initial SRP recruitment step. 

The canonical model that the SRP recognizes the nascent chain 
after the targeting signal exits the ribosome’ (Fig. 3a) makes several 
predictions. First, there should be few monosome- protected reads 
relative to polysomes before the first SS/TMD emerging from the 
ribosome tunnel; second, ribosome footprints should increase 
beginning approximately 40 codons after the first codon in the tar- 
geting signal, and third, monosome reads should decrease after full 
exposure of the SS/TMD, as SRP-RNCs are delivered to the membrane. 
Indeed, these patterns were observed in a subset of secretory transcripts 
with significantly more hydrophobic signals (Fig. 3b, Extended Data 
Figs 2e, fand 5c). SRP recruitment to these RNCs only occurred when 
the translated signals were fully exposed, and not while still in the exit 
tunnel**”? (Extended Data Fig. 5d and Supplementary Discussion). 

Notably, most secretory transcripts did not conform to the 
predictions of the model (Fig. 3b, c and Extended Data Fig. 5e). 


Instead, ribosome footprints from most SRP-bound monosomes 
were abundant well before translation of the first targeting signal. 
For instance, the RNCs of DAP2 were enriched on SRP from the start 
codon. For membrane proteins, SRP enrichment could be observed up 
to hundreds of codons before translation of the first TMD (Fig. 3d and 
Extended Data Fig. 5e). Thus, the exquisite selectivity of SRP towards 
secretory transcripts occurs via RNCs that have not yet translated any 
SS or TMD. Of note, the SRP-bound monosome reads did diminish 
upon full signal exposure by the ribosome. Thus, SRP is pre-recruited 
to secretory RNCs before the synthesis of an SS or TMD, but only after 
the emergence and recognition of the targeting signal can it promote 
membrane targeting, presumably owing to a conformational change 
in the SRP-RNC complex”**. Our findings show that the SRP stably 
and preferentially binds ribosomes translating secretory mRNAs in a 
manner independent from the sequence of the exposed nascent chain 
(Fig. 3e). Models in which SRP scans all ribosomes with high affinity 
and rapid kinetics” do not explain our findings, as discussed in the 
Supplementary Discussion. 

To begin to understand the determinants that confer specific 
recruitment of SRP without an exposed SS or TMD, we examined the 
most extreme cases of nascent-chain-independent SRP recruitment. 
PMP!1 and PMP2 encode two abundant, small membrane proteins of 
40 and 43 amino acids, respectively. Even though the entire proteins 
are smaller than the length of the ribosomal tunnel, PMP1 and PMP2 
RNCs bind to the SRP throughout translation (Fig. 3f and Extended 
Data Fig. 6a, b). 

We considered whether non-coding mRNA determinants could 
confer nascent-chain-independent SRP recruitment. PMP1 and PMP2 
contain long 3’ untranslated regions (UTRs) implicated in membrane 
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Figure 3 | Distinct mechanisms of SRP recruitment. a, Recruitment 

of SRP to RNCs is expected to increase ribosome-protected reads from 
SRP-bound monosomes when an SS or TMD is exposed to the cytosol 
(orange). b, Distributions of SRP-bound ribosome reads on representative 
transcripts from CHX-treated cultures. Selected transcripts are RCR2 and 
VBA4. c, Most secretory proteins demonstrated SRP enrichment before 
signal exposure. d, Metagene plot of the median value of enrichment of 
SRP-bound monosomes compared to polysomes. Included transcripts 
encode TMDs at least 40 codons from the start codon. Shaded areas 
represent enrichment before the TMD is encoded (cyan), while the TMD 
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is in the ribosome exit tunnel (lavender), and after the TMD is exposed 
(orange). e, Two mechanisms for SRP to select secretory mRNA. f, PMP1 
and PMP2 were the only tail-anchored proteins that enriched SRP. 

g, The GFP ORF was fused to the indicated 3’ UTRs and expressed 

in vivo. Srp72p-TAP was immunoprecipitated from the total soluble 
fraction and RNAs were subject to quantitative PCR (qPCR). n=3 
biological replicates; **P < 0.01, Welch’s t-test. h, Puromycin treatment 
of lysate from yeast expressing GFP with the PMP1 3’ UTR was followed 
by SRP immunoprecipitation and qPCR. n=3 biological replicates; 
*P<0.05, Welch’s t-test. 
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attachment”®. We thus tested the effect of fusing the 3’ UTR of either 
PMP1 or PMP2 to the mRNA of cytosolic green fluorescent protein 
(GFP) lacking any targeting signal (Fig. 3g). The 3’ UTR of cytosolic 
protein TUB2 served as a control. Notably, the 3’ UTRs of either PMP1 
or PMP2 conferred cotranslational SRP binding to the GFP transcripts, 
as well as membrane localization, whereas the 3’ UTR of TUB2 did not 
(Fig. 3g, Extended Data Fig. 6c). For all constructs, GFP protein was 
diffuse and cytosolic, indicating that the 3’ UTR alone is insufficient 
to promote substantial translocation of GFP into the ER (Extended 
Data Fig. 6d). Notably, the 3’ UTR of endogenous PMP1 is functionally 
important in vivo. Thus, replacing the 3’ UTR of the PMP1 gene with 
the 3’ UTR of TUB2 resulted in a growth defect more severe than 
complete deletion of the entire PMP1 gene (Extended Data Fig. 6e). 
Perhaps mislocalization of the TMD in the absence of the 3’ UTR is 
more toxic than the loss of gene function. 

Two non-exclusive models can account for SRP recruitment by the 
PMP1and PMP2 3’ UTRs. First, SRP binds to the mRNA, either directly 
or through other RNA-binding proteins. Alternatively, ribosomes 
translating PMP1 or PMP2 recruit SRP in a 3’ UTR-mediated manner 
(Extended Data Fig. 6f). To distinguish between these possibilities, 
a puromycin incubation was used to disrupt elongating’’ ribosomes 
before fractionation and SRP immunoprecipitation. This treatment 
caused a significant reduction in the GFP-PMP1 mRNA that 
copurified with SRP (Fig. 3h). Thus, translating ribosomes promote 
SRP recruitment to the GFP-PMP1 transcript. Of note, puromycin 
also disrupted the SRP interaction with the SEC61 mRNA control. We 
thus next examined the general role of translation in SRP recruitment. 

We assessed the global ribosome dependency of SRP binding to 
secretory transcripts using either puromycin or CHX incubations 
to disrupt or stabilize elongating ribosomes, respectively. Srp72p- 
bound transcripts isolated from the soluble fraction were examined 
by RNA-seq (Fig. 4a). SRP association with all secretory mRNAs was 
sensitive to puromycin. Transcripts that only recruit SRP through 
a canonical nascent chain interaction were more dependent on 
elongating ribosomes. The reduced puromycin sensitivity observed 
for pre-enriched transcripts may arise from the inability of puromy- 
cin to disrupt initiating ribosomes”’, which appear able to recruit SRP 
(Extended Data Fig. 6g). 

We next examined whether the membrane association of secretory 
transcripts similarly depends on continuing translation. In principle, 
ER-localized proteins could recruit secretory transcripts to the 
membrane in the absence of translation”®?* (Fig. 4b). Membrane and 
soluble mRNAs were fractionated in the presence and absence of puro- 
mycin treatment and subjected to RNA sequencing (RNA-seq) analysis 
(Extended Data Fig. 7). Disruption of translating ribosomes reduced 
membrane enrichment for all secretory protein transcripts, includ- 
ing those of PMP1 and PMP2. This result was confirmed using the 
GFP-PMPI reporter (Extended Data Fig. 6h). 

The translation-dependence of membrane association for secretory 
transcripts was further examined using a temperature-sensitive allele 
of eIF3 subunit PRT1, prt1-1. Shifting cells to the non-permissive 
temperature precludes mRNA binding to the 40S subunit’, allowing 
elongating ribosomes to run off (Fig. 4c). After displacing ribosomes 
from the mRNA, soluble and membrane fractions were analysed by 
RNA-seq. Notably, the only mRNAs that remained in the membrane 
fraction corresponded to mitochondrially encoded proteins. The 
membrane enrichment of all ER-destined secretory transcripts was 
abolished in the absence of translation (Fig. 4d). Thus, translation is 
required for the observed association of mRNAs with membranes. 

Our findings define the principles of cotranslational membrane 
targeting and the role of SRP in this process and provide a solution 
to the paradox of how SRP achieves exquisite specificity in vivo 
despite its low abundance, its substrate binding promiscuity, and 
despite the competition from abundant cytosolic chaperones*® 
that could potentially bind SSs or TMDs. For most mRNAs, SRP 
does not need to scan translating ribosomes rapidly for binding of 
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Figure 4 | Translation and the role of SRP. a, Distributions of RNA-seq 
SRP enrichment scores from secretory protein transcripts (SS, TMD, 
SS-TMD or tail-anchored), with or without puromycin treatment. 
Included ORFs have at least twofold SRP enrichment without puromycin. 
b, Transcripts are retained on the membrane through binding of the RNC 
to the translocon. It is also possible that mRNA binding proteins at the ER 
bind transcripts. c, The prt1-1 allele prevents initiation at non-permissive 
temperatures. Translational run-off removes all ribosomes from 
transcripts. d, Distributions of RNA-seq membrane-enrichment scores of 
secretory protein transcripts (n = 584). e, After mRNA export, a pioneer 
round of targeting directs secretory transcripts to the ER membrane. SRP 
is specifically pre-recruited to transcripts that will present a functional 
targeting signal. After emergence of an SS or TMD, SRP directs RNCs to 
the ER membrane. Once at the ER membrane, transcripts are retained over 
several rounds of translation. 


targeting sequences while ignoring near-cognate cytosolic hydrophobic 
sequences®. Instead, several mechanisms bias towards the correct 
SRP-RNC interactions (Fig. 4e). For most secretory mRNAs, SRP 
binds before targeting signals are synthesized, in a pioneer round 
of cytoplasmic translation. Pre-recruited SRP is thus poised to 
recognize the SS or TMD after emergence of a targeting signal from 
the ribosome” and facilitate membrane attachment. For a smaller 
fraction of clients with more hydrophobic-targeting signals, SRP 
recruitment is initiated by binding RNCs that fully expose SS or 
TMD in the nascent chain. We do not observe an SRP-induced 
elongation arrest, but some mRNAs have intrinsic elements attenu- 
ating elongation upon signal exposure. Since membrane targeting 
is in kinetic competition with continued elongation, posttransla- 
tional targeting dominates for proteins with a late targeting signal. 
Once at the membrane, secretory mRNAs remain bound through 
subsequent rounds of initiation and translocon engagement. 
These hydrophobic proteins will no longer compete with soluble 
proteins for cytosolic quality control components. Conversely, 
transcripts not captured in this first round of selection become more 
likely to encounter cytosolic chaperones. One important and surprising 
conclusion is that cotranslational events governing nascent polypeptide 
fate are not only guided by the nascent chain itself, but also rely on 
additional aspects of translation, such as mRNA itself and cellular 
organization. These findings illustrate the multi-layered nature of 
protein biogenesis fidelity. 


Online Content Methods, along with any additional Extended Data display items and 


Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Data reporting. No statistical methods were used to predetermine sample size. 
The experiments were not randomized, and the investigators were not blinded to 
allocation during experiments and outcome assessment. 

Yeast strains. Ribosome profiling (Ribo-seq) and qPCR assays were performed 
using BY4741 Srp72-TAP::His3MX, obtained from OpenBioystems*!. BY4741 
SEC61-GFP::His3MX was obtained from Invitrogen. Strain CY2522, containing 
the prtl-1 temperature-sensitive allele**, was provided by E. Craig. BY4741 
PMP1A::kanMX4 was obtained from the yeast deletion library**. 

Ribosome profiling. For each biological replicate, six 500-ml cultures of YPD 
were grown with shaking at 30°C to OD600 nm = 0.8-1.0, and collected one at a 
time by filtering through a 0.22-j1m membrane. Cells were scraped off the filter in a 
single motion using a metal scoopula and then immersed in liquid nitrogen. When 
indicated, CHX treatments were performed by adding the drug to the culture to 
100,.g ml! immediately before filtering. Filtration completed in approximately 
2 min, and cells were scraped and immersed in liquid nitrogen within 3s. 

Lysis buffer comprised 50 mM potassium MOPS, pH 7.2, 275 mM potas- 
sium glutamate, 5mM magnesium acetate, 1 mM DTT, 100p.g ml—! CHX, and 
20U ml"! SuperaseeIn (Ambion). Two 3-ml aliquots were supplemented with 
Complete Protease Inhibitor Cocktail, EDTA-free (Roche) and frozen dropwise 
in liquid nitrogen. One 3-ml aliquot of frozen lysis buffer was combined with cells 
from 1.5] of culture in a 50 ml ball mill chamber chilled in liquid nitrogen (Retsch). 
Cells were pulverized for 1 min at 20 Hz ina MM-301 mixer mill. Pulverized cells 
from 31 of culture were combined and thawed in a room temperature water bath. 
Lysates were immediately centrifuged in a Type 70.1 Ti rotor (Beckman) for 10 min 
at 12,000 r.p.m. The following were then added to the supernatant: Triton X-100 
to 0.01%, heparin sulfate to 0.2 mg ml~!, and PMSF to 1 mM. Heparin was added 
as an RNase inhibitor only after fractionation as it may dislodge ribosomes from 
the membrane*’. A low concentration of Triton X-100 reduces bead clumping 
during immunoprecipitation, and prevents aggregation upon elution. A portion 
of the supernatant was retained as the total soluble fraction. Three millilitres 
of lysis buffer supplemented with Triton X-100 to 1% and heparin sulfate to 
0.2 mg ml”! were added to the pellets. Pellets were resuspended using a glass 
dounce homogenizer fit to the internal diameter of the tube. Membrane extracts 
were centrifuged as before, and the detergent-extracted supernatant is recovered 
as the total membrane-bound fraction. 

One millilitre of streptavidin-conjugated magnetic beads (Pierce) was saturated 
with biotinylated total rabbit IgG (Calbiochem). Beads were incubated with the 
total soluble fraction for 1h at 4°C. Beads were then washed three times with 1 ml of 
wash buffer, which consisted of lysis buffer supplemented with 0.2 mg ml! heparin 
and 0.01% Triton X-100. Beads were then incubated with 10011 wash buffer, 
2 ul SuperaseeIn and 311 AcTEV protease (Invitrogen) for 1 h at room temperature. 
The first eluate was retained on ice, the digest was repeated, and the eluates com- 
bined, yielding the total SRP-bound fraction. The eluate was immediately quantified 
using A260 nm and the total soluble and membrane-bound fractions were diluted 
in wash buffer to equivalent concentrations in 20011. Samples were layered on a 
7-47% sucrose gradient prepared in wash buffer omitting RNase inhibitors. 
Gradients were centrifuged in a SW 41 Ti rotor (Beckman) for 2.5h at 39,000 r.p.m., 
and fractionated using a UA-6 detector and Foxy Jr. fraction collector (ISCO). 
Five 1-ml fractions containing the polysomes were combined, as were two 1-ml 
fractions containing the monosomes. Samples were diluted to 6 ml in wash buffer 
without RNase inhibitors and centrifuged for 12h at 50,000 r.p.m. in a Type 70.1 
Ti rotor. 

Ribosome pellets were resuspended in 25011 of cutting buffer of 20mM Tris, 

pH 7.5, 140mM potassium chloride, 1.5 mM magnesium chloride, and 
0.01% Triton X-100. Concentrations were typically 10-100 ng il”! as measured 
by A260 nm- Samples with greater concentration were diluted to 100 ngyl! in 
25011. Fifty units of RNase I (Ambion) were added to each sample, and digests 
proceeded for 1h at room temperature. Digests were stopped with the addition 
of 24] SuperaseeIn, transferred to a MLA-130 centrifuge tube (Beckman) 
and underlaid with 75011 of 35% sucrose in cutting buffer. Ribosomes were 
pelleted by centrifugation for 4.5h at 70,000 r.p.m. Total RNA was extracted from 
the pellets using a miRNeasy kit (Qiagen). Libraries were prepared as previously 
described**, quantified by qPCR (Kapa Biosciences), and sequenced using a HiSeq 
2500 (Illumina). 
RNA-seq. To test disruption of elongating ribosomes, yeast from one 
500-ml culture of BY4741 Srp72-TAP::His3MX were lysed in ribosome profil- 
ing lysis buffer, which includes CHX. Yeast from a second culture were lysed in 
buffer prepared without CHX that was supplemented with 0.5 mM puromycin, 
1mM ATP, 1mM GTP, 10mM creatine phosphate and 401g ml”! creatine kinase. 
Both lysates were incubated at room temperature for 10 min after thawing. 

To test the effect of initiation in vivo, two 500-ml cultures of CY2522 (prt1-1) and 
two 500-ml cultures of W303 were grown with shaking at 25°C to OD¢00 nm = 0.6. 
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One culture of each strain was shifted to 37°C, and the four cultures were grown 
one additional hour followed by harvesting by fast filtration. 

Samples of the total soluble, total membrane, and total SRP-bound fractions 
were prepared as described for ribosome profiling. RNA was extracted using the 
hot SDS-phenol-chloroform method, and mRNA was purified using oligo-dT 
beads according to the manufacturer’s instructions (NEB). Eluted mRNA was 
then fragmented under alkaline conditions’, and fragments of 35-50 nucleotides 
were purified by PAGE. Libraries were constructed as for Ribo-seq and sequenced 
using a HiSeq 4000 (Illumina). 

Data processing for enrichment scores. Adaptor sequences were trimmed from 
sequencing reads using Cutadapt””. Two rounds of alignment are performed using 
Bowtie, and Tophat**”. First, sequences are aligned against a library comprising 
mature ribosomal rRNA with Bowtie. Unaligned reads are retained and then 
aligned against S288C Release 64-1-1 (http://www.yeastgenome.org) using Tophat. 
Any read with more than one match was removed, and reads were assigned to 
ORFs and counted using the GenomicAlignments package in Bioconductor*™". 
Dubious ORFs were omitted. 

Identification of targeting signals and classification of ORFs. Different SS 
and TMD prediction programs vary in their output, and so we included only 
consistently predicted targeting signals in our analysis. Protein sequences were 
given to SignalP V3 (ref. 42), Phobius”’, Philius**, and TMHMM“*. The following 
scheme was used for classification. 

Mitochondrial. All mitochondrially encoded proteins, and nuclear-encoded 
proteins that localized to this organelle in at least one of two fluorescence-based 
screens*™*°, 

SSs. SSs from non-mitochondrial proteins predicted by SignalP, Phobius and 
Philius that did not contain any TMDs predicted by Phobius or Philius. The first 
residue of the hydrophobic domain, as predicted by Phobius, was designated as 
the first signal residue. Predictions by TMHMM within the first 50 residues were 
ignored. SSs were defined as ‘looped if they enriched on Ssh1p at least 90 codons 
after the first SS codon'®. Every GPI-anchored protein (as previously annotated'') 
satisfies conditions for an SS, and was included. Some GPI-anchored proteins have 
predicted TMDs near the stop codon; these do not expose cotranslationally. 
TMDs. The first TMDs of non-mitochondrial proteins were predicted by 
TMHMM, Phobius, and Philius within five amino acids for each pair of predictors 
(that is, Philius versus Phobius, Philius versus TMHMM, Phobius versus 
TMHMM). The first signal location was designated as the average of the three 
predictions, rounded down. If the TMD was within the first 50 codons, predictions 
by SignalP are ignored. Phobius and Philius did not predict a cleavable SS. 
Tail-anchored. First TMDs that begin within 50 amino acids of the stop codon 
were designated tail-anchored. 

Signal sequences with transmembrane domains (SS-TMD). SSs were predicted 
as above with at least one TMD predicted by both Phobius and Philius. 
Cytonuclear. ORFs that had no predicted SS or TMD by SignalP, TMHMM, 
Phobius, or Philius, and which did not appear in the mitochondria. Since only 
fluorescence localization data were used to designate mitochondrial proteins, this 
set includes some true mitochondrial proteins. 

Exceptions. All remaining sequences had an SS or TMD predicted by SignalP, 
Phobius, Philius, or TMHMM that was not predicted by the other programs. 
Because of the ambiguity in type and location of the targeting signal, these proteins 
are excluded from our analysis. Other exceptions included proteins with predicted 
TMDs from position 50 or later, as well as a SignalP SS prediction but no Phobius or 
Philius SS. We considered this ambiguity in the prediction of an SS, and excluded 
these ORFs. 

Enrichment analysis. Count-based sequencing assays report changes in a 
transcript’s abundance as changes in its proportion of the total sequencing reads. 
Thus, when a subset of transcripts is enriched in a sample, the proportion of reads 
from all other transcripts decreases by a corresponding amount. Here we assume 
that enrichment on the membrane or SRP represents active selection of certain 
transcripts, and the depletion of all others is passive. In other words, we assume 
that few transcripts will be specifically prevented from appearing in the membrane 
or SRP. The distributions of the ‘Cytonuclear’ sets in Figs 1a and 2b, which skew 
towards enrichment, are consistent with this assumption. Thus, we expect the 
distribution of non-enriched transcripts to be similar to their overall expression 
or translation. This makes direct comparison of different enrichment scores (that 
is, SRP enrichment vs membrane enrichment, under various drug treatments etc.) 
calculated from proportional abundance!” unintuitive, because a component 
of enrichment appears as depletion of non-enriched mRNA. 

In current approaches for differential gene expression, most genes are assumed 
to have unaltered abundance, and library sizes are normalized by a robust estimator 
such as median ratio method*® and trimmed mean of M-values (TMM)??. 
However, we expected changes for up to a third of ORFs. We used the TMM 
method of DESeq to derive library scale factors using reference ORFs, selected as 
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those designated as ‘cytonuclear’. We applied these scale factors to the counts for 
every ORF, and calculated enrichment as the ratio of scaled reads between sets. 
Biological replicates were scaled separately and then averaged. The robust nature 
of scale estimation allowed for extreme cases of the reference set to have high 
enrichment scores. Scores were only reported for ORFs that have at least 100 total 
reads between replicates in each of the compared samples”. 

Figure 2c included proteins designated as SS, TMD and SS-TMD that have 
at least 100 reads in all four data sets, with the following exceptions: YELO50C, 
YLRO77W, YML061C, YOLO53W and YPL132W, which were all observed on the 
surface on the mitochondria’ and had at least fourfold membrane enrichment 
here. Membrane enrichment after CHX treatment was determined using total 
ribosome-protected reads from the soluble and membrane fractions. 

Mapping of ribosome-protected reads to codons. Reads were mapped to 
codons using an alternate method. After filtering rRNA, reads were aligned 
to a Bowtie library comprising coding sequences, plus the stop codon and 
21-nucleotides flanking upstream and downstream. Using combined data from 
the SRP-pulldown and membrane polysome replicates, ORFs for which at least 
20% of reads could map to a second ORF were removed, leaving a working set of 
5,441 genes. Footprints of 26-35 nucleotides were processed separately for each 
length. The nucleotide that mapped to the centre of each read (rounded down) 
was given a value of 1, and reads were summed at each nucleotide position. 
A metagene analysis was performed™ and for each footprint length, an integer 
offset was determined so that the characteristically large peak at the start codon 
was maximized at the second nucleotide position (that is, aTg). Then, reads of 
all lengths were offset and combined. Nucleotide reads were summed for each 
codon. 

Read distributions. For each sample, the total reads from elongating ribosomes 
were determined by adding counts from all codons excluding the first two and 
last two sense codons. Reads from biological replicates were summed to increase 
overall read depth, but owing to the high overall reproducibility, all of our 
conclusions can be demonstrated by treating replicates separately. The reads at each 
codon position are then divided by this total and multiplied by one million to yield 
reads per million (RPM). Values at each codon are smoothed using an 11 residue 
rolling average. The positions of TMDs in topology diagrams were taken from 
the TMHMM prediction. We caution that predictors may differ in the number 
and position of subsequence TMDs. Positions of SSs are the H-region predicted 
by Phobius. The point at which an SS or TMD begins to emerge is considered 
40 codons after the first encoded residue of the signal. 

For metagene plots, reads at each codon are smoothed using a 5 residue rolling 
average. ORFs are then aligned as indicated, and the median and interquartile 
ranges are calculated at each position. For each ORF in Figs 1c and 2d, reads at 
each codon position are divided by the mean reads per codon within the range 
+20 to +40 after first signal codon. Included ORFs have at least 20 reads within 
this window in each data set shown. The first 30 codons of each ORE are excluded 
to avoid the universally observed low-density region near the start codon. 
Identification of SRP recruitment to the nascent chain. Increases in 
ribosome-protected reads in the soluble SRP-bound polysome data set were 
observed for a subset of ORFs at codon positions coincident with the exposure 
of targeting signals. We developed a clustering scheme that sorted ORFs by the 
shape of the distribution of ribosome-protected reads specific to SRP-bound 
polysomes. The test set comprised 568 SS, TMD or SS-TMD proteins with at 
an average of at least 3 reads per codon in both soluble SRP-bound polysome 
and membrane-bound polysome sets. For each ORF, we first smooth read 
counts from each data set with a leading 15-residue moving average window. 
We corrected for local features intrinsic to the sequence (that is, appearing in all 
fractions) by dividing the smoothed SRP-bound polysome reads by the smoothed 
membrane-bound polysome reads at each codon; positions with fewer than 3 reads 
in either smoothed set were omitted. Peak codon positions were identified as the 
maximum value within 30-180 residues after the first predicted targeting signal 
codon. Each ORF was scaled by dividing the value at each codon position by the 
mean value over the range from 50 codons before to 200 codons after the peak. 
Codon positions outside this range are discarded. Scaled values are then used 
to generate an empirical cumulative distribution function (ECDF). The ECDF is 
sampled from 0.0 to 2.0 in 0.1 steps. The samplings from the ECDFs were used 
for agglomerative hierarchical clustering using a Euclidean distance function and 
Ward’s minimum variance method*". The first split in the population distinguished 
ORFs having strong peaks from ORFs with weak or no peaks. 

To analyse the distances between the first signal codon and the peak, peaks 
must be unambiguously assigned to the first targeting signal; otherwise the 
peak may be due to a later TMD. The distance to the peak was compared to the 
distance between the first TMD and the second (as determined by TMHMM) or 
the distance between the SS and the first TMD (as determined by Phobius). Peaks 
were considered unambiguous only if no more than 8 residues of the next TMD 


were translated. This value is the length of the shortest functional signal sequence 
in our set, controlling for any affect that an additional signal within the exit tunnel 
may have on SRP. 

An alternative approach was used to determine codons with significant 
recruitment of SRP in mitochondrial transcripts and transcripts lacking an 
ER-targeting signal. For every ORE, a count matrix was built with codon position 
as rows, and replicates of SRP-bound polysome ribosome-protected reads, total 
soluble polysome reads and total membrane polysome reads as columns. These 
matrices were individually input to DESeq2 (ref. 48), and a linear model was fit 
using the presence or absence of SRP co-immunoprecipitation as the coefficient; 
in this application, codons were treated as ‘genes. Reads at each codon were 
used for fitting local dispersion trends. Genes with at least one codon that had at 
least threefold enrichment with P < 0.001 were selected. We note that all six SRP 
subunits had local enrichment towards the C terminus; since this may be cotrans- 
lational particle assembly, we omitted them from further analysis. Thirteen other 
genes were identified, and binding sites were assigned to the first significantly 
enriched codon preceding the position with maximum enrichment. 
Quantification of early SRP enrichment. Secretory protein ORFs which were 
used in the analysis of SRP recruitment to the nascent chain (see previous section) 
were also tested for SRP pre-enrichment by comparing ribosome-protected reads 
from SRP-bound polysomes and monosomes from a CHX-treated culture. 
The RPM values from codon 10 to the position of the first SS or TMD, plus 40, 
were added, and the monosome sum was divided by the polysome sum. The first 
9 sense codons were omitted to avoid artefacts near the start of transcripts. Ratios 
of greater than 1 were designated pre-recruited. 

The enrichment in Fig. 3d was determined by first smoothing SRP-bound 
monosome and polysome reads (in RPM) using an 11 codon window, and then 
dividing the monosome values by the polysome values. 

GFP reporter constructs. Sequences of the yeast TUB2 5’ UTR (300 nucleotides 
preceding the start codon), the TUB2 3’ UTR (300 nucleotides following the 
stop codon), the PMP1 3’ UTR (600 nucleotides), and the PMP2 3’ UTR (500 
nucleotides) were PCR amplified from BY4741 genomic DNA with flanking 
overlaps to the M13 (—20) forward or M13 reverse sequences, and to the beginning 
or end of the sfGFP ORF sequence®”. The sequence of sfGFP was amplified from 
a pET33b-derived expression vector provided by W. Clemons. Plasmids were 
assembled in a single reaction using Gibson assembly** using the M13 (—20) 
forward and M13 reverse sequences to amplify pRS315. Plasmids were transformed 
into BY4741 Srp72-TAP::HIS3MX. 

qPCR. For each biological replicate, 500 ml of synthetic complete media lacking 
leucine were inoculated with an overnight culture to OD¢00 nm of 0.05. Cultures 
were grown at 30°C to OD600 nm 0.8-1.0 and then collected by fast filtration. Cells 
were lysed and fractionated, and Srp72p was immunoprecipitated as described for 
RNA-seq. Purified RNA were subjected to TURBO DNase digestion (Ambion). 
Concentrations were determined using A260 nm, and 100 ng of RNA was used to 
synthesize cDNA using iScript (Bio-Rad). qPCR was performed on a CFX-96 
thermocycler using iTAQ Universal SYBR Green Supermix (Bio-Rad). ACT1 was 
used as a reference for each fraction from the same culture, and enrichments were 
determined as fold difference of mRNA in the membrane or SRP-bound fractions 
over the soluble fraction. Enrichment scores from three technical replicates of the 
qPCR step were averaged, and biological replicates are shown. 

Statistical hypothesis testing. All analysis was performed using the R program- 
ming language (https://www.r-project.org). Statistical significance in comparing 
distributions of SRP or membrane enrichment scores from ribosome profiling 
(Figs 1a and 2b), as well as in comparing hydrophobicity scores (Extended Data 
Fig. 2e), codon usage, or residue abundance (Extended Data Fig. 3d, e) was 
determined using two-sided Wilcoxon rank-sum tests. This test assumes 
independence of observations and does not require a normal distribution. 
Enrichment distributions are multimodal and so mean and variance estimates are 
not provided. Significance of SRP or membrane enrichment from qPCR (Fig. 3g, h 
and Extended Data Fig. 6c, h) was determined using a two-sided Welch’s t-test 
on log-transformed enrichment values. This test assumes normal distributions 
but allows unequal variance. In all tests, the null hypothesis is the distributions of 
tested populations are equal. 

Code availability. Scripts for data processing and analysis in R are available upon 
request. 
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Extended Data Figure 1 | Cotranslational membrane enrichment. 

a, Crude lysates were fractionated, and then polysomes were recovered 
by sucrose gradient ultracentrifugation and used for ribosome profiling. 
b, Enrichment of ribosome-protected mRNA reads in the membrane 
polysome fractions over the soluble polysome fractions from two 
biological replicates. Every dot represents one ORE. c, Metagene plots 
of soluble polysome ribosome-protected reads of transcripts encoding 
proteins lacking ER-targeting signals (top), or of membrane-bound 
polysome-protected reads of transcripts encoding secretory proteins 
that were at least twofold membrane-enriched (bottom). For each ORF, 
ribosome-protected reads at each position were scaled by dividing by the 
mean reads per codon of the ORF, excluding the first two and last two 
sense codons. The median scaled reads at each position are plotted as a 


Codons between 1% SS/TMD and stop codons 


line, and the interquartile range is shaded in grey. d, Ribosome- 
protected reads at each codon of an example secreted protein, 
B-1,3-glucanosyltransferase (GAS1), a model SRP-independent protein”. 
Topology is indicated above, with the signal sequence in lavender. The 
position where the signal begins to emerge from the ribosome exit tunnel 
is indicated. e, The number of codons remaining after the encoding of 
the first residue of an SS, and the corresponding membrane enrichment 
per SS-containing ORF. Signal sequences were divided between those 
that bind Ssh1p directly upon exposure and those that require a looped 
conformation (>90 codons after the first SS codon)’*. f, Transcripts 
remain at the membrane by subsequent translocon binding, thus the small 
soluble fraction comprises mRNA undergoing initial targeting. 
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Extended Data Figure 2 | Cotranslational enrichment of SRP. 
a, Enrichment of ribosome-protected mRNA reads in the soluble SRP- 
bound polysome fractions over the total soluble polysome fractions 
from two biological replicates. b, The number of codons remaining after 
encoding of first SS or TMD residue, and the corresponding SRP and 
membrane enrichment scores per ORF. Scores are determined from 
cultures harvested without added CHX. Enrichment scores are indicated 
with filled dots, and the scores from the same transcript are linked with 
a grey line. The vertical dashed line indicates 50 codons, the boundary 
for tail-anchored proteins. Here, only SSs that bind Ssh1p directly after 
exposure from the RNC are shown. c, Secretory transcripts were classified 
into two groups based on the ribosome-protected-read distributions from 
SRP-bound polysomes. Some showed a pronounced increase in reads at 
positions coincident with the initial exposure of an SS or TMD by the 
ribosome, whereas others did not. Shown here are metagene analysis plots 
of soluble polysome-protected reads from the categorized TMD proteins. 
For each ORE, the reads at each codon position were divided by the mean 
reads per codon within the range +20 to +40 after the first signal codon. 
The first 30 codons of each ORF are excluded to avoid the characteristic 
low-density region near the start codon. The lavender line indicates when 
the first TMD begins to emerge from the exit tunnel, and the dashed line 
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indicates the position of the read peak. Notably, the total soluble polysome 
reads depleted in a similar manner for both classes, a read increase was 
not observed in the total soluble reads, and reads from the SRP-bound 
transcripts with a peak did not deplete faster than the total soluble reads. 
These features are consistent with a model in which SRP is recruited 

at the peak site, and elongation then proceeds at the same rate. d, The 
number of codons remaining after encoding of the first SS or TMD and 
corresponding SRP enrichment. Transcripts are classified by the presence 
or absence of a read increase following signal exposure, as in c. Note that 
for SRP-enriched transcripts with signals closest to the terminus (<100 
codons), evidence of direct binding between SRP and the nascent chain 
was always observed. SRP can therefore bind late TMDs immediately 
after they become exposed by the ribosome. e, Maximum hydrophobicity 
across targeting signals using an 8-residue averaging window. Only 
signals with peaks that could be unambiguously attributed to a targeting 
signal were included. Hydrophobicity was determined by attributing 

the biological hydrophobicity score to each encoded amino acid™. 

#* D < 0.001, Wilcoxon rank-sum test. f, Distribution of the distance 
between the first codon of a targeting signal and the position of the 
downstream read increase. Only transcripts wherein the increase can be 
unambiguously attributed to a specific targeting signal were included. 
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Extended Data Figure 3 | Elongation pausing and local SRP 
recruitment. a, b, Local increases in ribosome-protected reads from 
membrane-bound polysomes, indicated by orange lines, were coincident 
with rare codons, as in cell division cycle protein 1 (CDCI, a) or polybasic 
nascent chains, as in the plaama membrane G-protein-coupled receptor 
(GPRI1, b). Soluble SRP-bound polysome-protected reads were further 
increased at the same positions. c, In these cases, hydrophobic sequences 
in the nascent chain were exposed to the cytosol at the locations of 
increased reads, which were coincident with elongation attenuators. 

d, Translational efficiencies for the 6 codons following, and the number 

of stalling residues within the 10 residues preceding, the sites of increased 
SRP-bound ribosome reads. Translational efficiency was determined by 
attributing the normalized translational efficiency (nTE) score to each 
codon®°. Residues that were found to stall the ribosome, based on previous 
investigation*”°**’, were lysine, arginine, glutamate, aspartate, proline 

and glycine. Because of variation in specific motifs, and uncertainty in 
whether these motifs are additive, we simply compared the total number 
of these residues in the indicated 10 residue spans. Sets of 10,000 random 
sequences, at least 10 amino acids from the stop codon, were sampled from 


Within ribosome 
exit tunnel 


Emerged from 
ribosome 


5,907 non-dubious ORFs, and translational efficiency and stalling 
residues were determined over 6 or 10 codon spans. *P < 0.05, 

**P <0.01, Wilcoxon rank-sum tests. e, The targeting signals that 
recruited SRP directly to the nascent chain unusually far from the 
encoding of the signal had SRP-binding sites coincident with intrinsic 
elongation attenuation. Secretory protein transcripts that showed an 
increase in SRP-bound protected reads (see Extended Data Fig. 2c, f) were 
further classified by the position of the peak relative to the first signal 
codon. Transcripts with peaks found at least 80 codons after the signal had 
significantly lower translational efficiency in the 6 codons following the 
peak. These transcripts also had a greater, but not statistically significant, 
amount of stalling amino acids in the 10 residues preceding the peak. 

*P <0.05, Wilcoxon rank-sum tests. f, Similar increases in SRP-bound 
reads were observed for certain non-secretory proteins as exemplified 

by phosphoacetylglucosamine mutase (PCM1) and tRNAS* Uma 
2'-O-methyltransferase (TRM44). Hydrophobic sequences in non- 
secretory proteins, coupled with attenuation of elongation, may lead 

to SRP recruitment. 
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Extended Data Figure 5 | Ribosome profiling of SRP-bound 
monosomes. a, Ribosome-protected reads, in tags per million (TPM) 
for each ORF, from SRP-bound monosome fractions from two biological 
replicates. b, Ribosome-protected reads from the soluble SRP-bound 
monosome and SRP-bound polysome fractions of the same biological 
replicate, with CHX treatment. c, Distribution of ribosome reads within 
example ORFs that display SRP-bound monosome and polysome profiles 
consistent with direct recognition of the nascent chain. d, If RNCs can 
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recruit SRP while a TMD is within the exit tunnel, then there will be an 
increase in ribosome-protected reads from SRP-bound monosomes when 
the TMD begins to translate (lavender). This increase will maximize when 
the TMD is exposed to the cytosol (orange). e, Distribution of ribosome 
reads within example ORFs that display SRP-bound monosome profiles 
consistent with recruitment to transcripts before targeting signal synthesis. 
Examples are arranged for an increasing distance from the start codon to 
the first TMD. Only the first 600 codons for each ORF are shown. 
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Extended Data Figure 6 | The role of the UTR from PMP1 and PMP2. 
a, The cotranslational SRP enrichment of the PMP1 and PMP2 ORFs 
was similar to other bona fide secretory proteins, such as SEC61. 
By contrast, cytosolic proteins such as tubulin (TUB2) were not enriched. 
The enrichment scores are determined from the SRP-bound and total 
soluble polysomes from two biological replicates collected without added 
CHX. b, Distribution of ribosome-protected reads from soluble polysomes 
within the PMP1 and PMP2 ORFs. c, Membrane enrichment, determined 
by qPCR, of the mRNA of GFP fused to the indicated 3’ UTRs. The 
coding sequence of endogenous SEC6] transcript was also amplified as a 
control for a membrane-localized transcript. **P < 0.01, n =3 biological 
replicates, Welch’s t-test. d, Localization of mature GFP. Scale bar, 5 1m. 
Yeast were grown to mid-log phase and imaged using an Axio Observer 
Z1 with a Plan-Apochromat 100 x /1.4 oil immersion objective (Zeiss). 
Z-stacks were deconvoluted by the iterative maximum likelihood 
algorithm in ZEN (Zeiss) and single planes are shown. Images were 
representative from a set of two replicated assays. e, Yeast growth after 


- + 
pGFP-PMP1 


replacement of the endogenous 3’ UTR of PMP1 with the 3’ UTR of 
tubulin. Also shown is a complete deletion of PMP1 ORF**. Gibson 
assembly* was used to fuse the 300-nucleotide TUB2 3’ UTR to the 
KIURA3 cassette into Smal digested pUC19. The TUB2-UTR-URA3 
element was PCR amplified, including 40-nucleotide overhangs matching 
genomic sequences, and replaced the 650 nucleotides immediately 
following the PMP1 coding sequence in strain BY4741 by homologous 
recombination. Image is representative from a set of 3 replicated assays. 
f, Nascent-chain-independent SRP recognition may require ribosomes. 
Puromycin treatment of lysates disrupts elongating, but not initiating, 
ribosomes. g, Transcripts showing only canonical recognition are more 
sensitive to puromycin. This is consistent with puromycin resistance 

of SRP that has pre-recruited to initiating ribosomes. h, Membrane 
enrichment of the GFP-PMP1 construct or SEC61 mRNA after lysates 
were incubated with puromycin. **P < 0.01, n= 3 biological replicates, 
Welch’s t-test. 
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Extended Data Figure 7 | The role translation in membrane enrichment. a, Lysates were treated with puromycin before membrane fractionation. 
mRNA recovered from the soluble and membrane fractions were used for RNA-seq b, Membrane enrichment of secretory protein transcripts (SS, TMD, 
SS-TMD, or TA, n= 729) following puromycin treatment of lysates. 
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Mechanism of arginine sensing by CASTORI 


upstream of mTORC1 


Robert A. Saxton!?3;45, Lynne Chantranupong!**;+5, Kevin E. Knockenhauer', Thomas U. Schwartz! & David M. Sabatini!??+5 


The mechanistic Target of Rapamycin Complex 1 (mTORC1) is 
a major regulator of eukaryotic growth that coordinates anabolic 
and catabolic cellular processes with inputs such as growth factors 
and nutrients, including amino acids'>. In mammals arginine is 
particularly important, promoting diverse physiological effects 
such as immune cell activation, insulin secretion, and muscle 
growth, largely mediated through activation of mnTORCI (refs 4-7). 
Arginine activates mTORC1 upstream of the Rag family of 
GTPases’, through either the lysosomal amino acid transporter 
SLC38A9 or the GATOR2-interacting Cellular Arginine Sensor 
for m[ORC1 (CASTOR1)*"!”. However, the mechanism by which 
the mTORC1 pathway detects and transmits this arginine signal 
has been elusive. Here, we present the 1.8 A crystal structure 
of arginine-bound CASTOR1. Homodimeric CASTORI binds 
arginine at the interface of two Aspartate kinase, Chorismate 
mutase, TyrA (ACT) domains, enabling allosteric control of 
the adjacent GATOR2-binding site to trigger dissociation from 
GATOR2 and downstream activation of mTORC1. Our data 
reveal that CASTOR] shares substantial structural homology with 
the lysine-binding regulatory domain of prokaryotic aspartate 
kinases, suggesting that the mTORCI1 pathway exploited an ancient, 
amino-acid-dependent allosteric mechanism to acquire arginine 
sensitivity. Together, these results establish a structural basis for 
arginine sensing by the mTORC1 pathway and provide insights into 
the evolution of a mammalian nutrient sensor. 

To understand the molecular mechanisms through which CASTORI 
detects the presence of arginine and signals it to mMTORC1, we deter- 
mined the crystal structure of arginine-bound CASTORI to 1.8A 
resolution (Extended Data Table 1). Our findings show that CASTOR1 
forms a rod-shaped homodimer, with the monomers associated in a 
side-by-side manner and rotated 180° with respect to each other 
(Fig. 1a). Although sequence analysis of CASTORI predicted the pres- 
ence of two ACT domains!”!3, the structure reveals that each monomer 
actually contains four tandem ACT domains. ACT1 displays the canon- 
ical 308803 ACT domain topology'*'°, whereas ACT2 contains two 
additional B-strands and ACT3 and ACT4 each lack the final 3-strand. 
(Fig. la and Extended Data Fig. 1a). 

The dimerization interface buries around 950 A? of surface area 
at the intersection between the a1 helix of ACT1 and the a5 helix 
of ACT3 (Fig. 1b). Two inward-facing isoleucine residues of each 
monomer (Ile28 and Ile202) form the hydrophobic core of the sym- 
metrical interface, flanked on each side by tyrosine-histidine pairs 
(His25 and Tyr207) that form both 1-stacking and hydrogen-bond 
contacts with the opposing monomer (Fig. 1b). To understand the 
importance of dimerization in CASTOR1 function, we generated 
constitutively monomeric mutants of CASTOR1 (Y207S and 1202E; 
Fig. 1c). Notably, although dimerization is dispensable for arginine 
binding (Extended Data Fig. 2a), these mutants interacted weakly with 
GATOR2 and failed to inhibit mTORC1 signalling in cells (Fig. 1c and 


Extended Data Fig. 2b). This finding indicates that CASTORI1 must be 
dimeric to robustly inhibit GATOR2 upon arginine starvation. 

CASTORI binds arginine through a narrow pocket at the interface 
of ACT2 and ACT4, distal to the dimerization interface (Fig. 1a, 2a, b). 
The side chain of arginine projects towards the 815 loop, a loop con- 
necting 315 and B16, where the backbone carbonyls of Thr300, 
Phe301, and Phe303 coordinate the guanidinium group of arginine 
(Fig. 2a). Immediately adjacent to the $15 loop, the anionic side chain 
of Asp304 forms an additional stabilizing salt bridge with the cationic 
arginine side chain (Fig. 2a). On the opposite side of the pocket, the 
hydroxyl side chain of Ser111 and the backbone carbonyl of Val112 
in the a3 loop anchor the free amino group of arginine in place, while 
the free carboxyl group points towards a water-filled cavity that sep- 
arates it from ACT2 (Fig. 2a, b). Mutation of either Ser111 or Asp304 
(S111A, D304A) abolished the arginine-binding ability of CASTOR1 
in vitro, highlighting the critical role of these contacts in arginine 
sensing by CASTORI (Fig. 2c). Furthermore, when expressed in 
HEK-293T cells, these mutants bound constitutively to GATOR2 
and strongly inhibited mTORCI1 signalling even in the presence of 
arginine (Fig. 2d). 

Together, these data explain the molecular determinants of specificity 
in the CASTOR -arginine interaction. While Ser111 fixes the position 
of the free amine, the location of the 815 loop and Asp304 sets a strict 
length requirement for the bound ligand (Extended Data Fig. 3a). In 
addition, the positions of the three hydrogen-bond-donating nitro- 
gen atoms in the guanidinium group facilitate contacts with both the 
carbonyl oxygen atoms in the $15 loop and the side chain of Asp304 
(Fig. 2a). Finally, the gap behind the free carboxyl group of arginine 
suggests that CASTORI can tolerate ligands with modifications to that 
functional group (Fig. 2b). We tested these predictions by investigating 
the ability of various arginine analogues to disrupt the CASTOR1- 
GATOR2 interaction in vitro (Fig. 2e and Extended Data Fig. 3b). 
Consistent with our structural analysis, while the carboxy-modified 
arginine-methy] ester triggered full dissociation of CASTOR] from 
GATOR2, compounds with alterations to the guanidinium group, 
a-amine, or the length of the side chain had no effect. 

In addition to the main pocket contacts described above, a highly 
conserved, glycine-rich loop connecting $14 and a7 in ACT4 (614 
loop, residues 269-280) wraps over the arginine pocket, fully burying 
the bound ligand (Figs 2a, 3a and Extended Data Fig. la). The 614 
loop forms several hydrogen bonds with arginine through the back- 
bone amides of Gly279 and Ile280, as well as the backbone oxygen 
atoms of Gly274 and Glu277 (Figs 2a, 3a). The ordered conformation 
of the 314 loop also places it just along the ACT2-ACT4 interface, 
enabling it to form several intramolecular contacts with residues in 
ACT2 (Fig. 3a). Cys278 forms hydrogen bonds with the backbones of 
Val110 and $111 in the a3 loop, while Asp276 forms a salt bridge with 
Arg126. In addition, Glu277 extends in the opposite direction to form 
another salt bridge with His175 (Fig. 3a). Thus, the 314 loop facilitates 
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Figure 1 | Architecture of human CASTORI. 
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a, Two orthogonal views of the CASTOR] 
homodimer (ribbon diagram), with ACT- 
domains 1-4 coloured in green, purple, wheat, 
and pink, respectively. The bound arginine is 
shown in yellow. Disordered regions not observed 
in the crystal structure are omitted. b, View of 
the CASTORI dimerization interface, with side 
chains of key residues represented in stick form. 
c, Dimerization-deficient CASTOR1 Y207S 

and 1202E mutants display weaker interactions 
with endogenous GATOR2. HEK-293T cells 
transiently expressing FLAG-tagged CASTOR1 
wild type (WT) and the indicated haemagglutinin 
(HA)-tagged constructs were starved of arginine 
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the formation of numerous inter-ACT-domain contacts in the presence 
of arginine. Indeed, the arginine and 814 loop contribute about 40% 
of the total buried surface area in the ACT2—ACT4 interface of the 
arginine-bound structure (390 A? out of 980 A). 

The glycine-rich 814 loop is predicted to have a high propensity for 
disorder. Our structure suggests that these inter-ACT-domain contacts 
could stabilize it in an ordered conformation over the bound arginine. 


£14 loop 
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Indeed, mutation of key residues in both the 314 loop (D276A, E277A, 
C278A) and the adjacent ACT domains (R126A, H175A) significantly 
reduced the arginine-binding capacity of CASTORI1 (Fig. 3b, c), indi- 
cating that the inter-ACT-domain contacts formed by the 314 loop 
are required for arginine sensing by CASTORI. In addition, we found 
that the N-terminal (ACT1 and ACT2) and C-terminal (ACT3 and 
ACT4) halves of CASTORI associated in both an arginine- and 
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Figure 2 | The arginine-binding pocket of CASTORI. a, View of the 
arginine-binding pocket in CASTORI, together with its F,—F. electron 
density map calculated and contoured at 4c from an omit map lacking 
arginine. The bound arginine is shown in yellow. Hydrogen bonds or salt 
bridges are shown as black dashed lines. Residues 269-273 are omitted for 
clarity. b, Steric view of the arginine-binding pocket, depicting the surface 
representation of CASTOR1 and stick model of arginine (yellow). The 314 
loop (residues 269-280) is omitted for clarity. c, CASTORI S111A and 
D304A mutants do not bind arginine in vitro. FLAG-immunoprecipitates 
prepared from HEK-293T cells transiently expressing the indicated 
FLAG-tagged proteins were used in binding assays with [*H]arginine 

as described in the Methods. Values are mean + s.d. for three technical 
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replicates from one representative experiment. d, The CASTORI1 S111A 
and D304A mutants constitutively bind GATOR2 and inhibit mTORC1 
signalling in cells. HEK-293T cells transiently expressing FLAG-S6K1 and 
the indicated HA-tagged constructs were starved of arginine for 50 min 
and, where indicated, re-stimulated for 10 min. Both FLAG- and HA- 
immunoprecipitates were prepared from lysates and analysed as in Fig. 1c. 
e, Effects of various arginine analogues on the CASTOR1-GATOR2 
interaction in vitro. HEK-293T cells transiently expressing wild-type HA- 
CASTORI were starved of arginine for 50 min. HA-immunoprecipitates 
were prepared from cell lysates then incubated with 400 1M of the 
indicated compounds for 20 min and analysed as in Fig. Ic. 
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Figure 3 | Arginine facilitates the intramolecular association of the 
ACT2 and ACT4 domains of CASTOR1. a, Top-down view of the 
arginine- and 314-loop-mediated contacts between ACT2 and ACT4, 
Hydrogen bonds and salt bridges are shown as black dashed lines. 

b, The CASTORI1 D276A, R126A, E277A, H175A, and C278A mutants 
display reduced arginine-binding capacity in vitro. Binding assays 
were performed and immunoprecipitates analysed as in Fig. 2c. Values 
are mean + s.d. for three technical replicates from one representative 
experiment. c, The CASTOR1 D276A, R126A, E277A, H175A, and 


814-loop-dependent manner when expressed as separate polypeptides 
in HEK-293T cells’? (Fig. 3d), indicating that arginine probably induces 
a conformational change in CASTOR] by stabilizing the ACT2-ACT4 
interaction. 

In addition to CASTOR1, human cells express a related protein, 
CASTOR2, which shares 63% sequence identity with CASTORI but 
does not bind arginine’’. Although the regions of CASTOR1 that are 
directly involved in arginine binding are well conserved (Extended 
Data Fig. la), we identified residues along the ACT2-ACT4 interface 
(His108 to Val110) that differ between CASTORI1 and CASTOR2 
(Extended Data Fig. 4a). Replacing these residues in CASTORI with 
those from CASTOR2 abrogated arginine binding in vitro and con- 
verted CASTOR to a nearly-constitutive GATOR2-interactor in cells, 
resembling CASTOR2 (Extended Data Fig. 4b-d). Notably, these res- 
idues immediately precede Ser111 and form a hydrogen bond with 
Cys278 in the 814 loop (Fig. 3a and Extended Data Fig. 4a), suggesting 
that their identity may be critical for the proper positioning of the a3 
loop to enable arginine binding and/or the association of ACT2 and 
ACT4. The corresponding mutation in CASTOR2 (QNI108-110HHV), 
however, was not sufficient to confer arginine-binding ability, suggest- 
ing that additional amino acid differences also contribute to this func- 
tional difference (Extended Data Fig. 4d). 

To understand how arginine induces dissociation of CASTOR from 
GATOR2, we identified five highly conserved sites in CASTOR1 that are 
required for its interaction with GATOR2 (Y118, Q119, D121, E261 and 
D292; Fig. 4a and Extended Data Fig. 1a). Importantly, these mutants 
still bind arginine in vitro and homodimerize when expressed in cells 
(Extended Data Fig. 5a, b). Notably, these residues cluster along the 
surface of the ACT2-ACT4 interface, adjacent to the arginine-binding 
pocket but on the opposite face of the protein (Fig. 4b, c). Glu261 and 
Asp292 are closely linked to the 814 loop, separated only by 614 and 
a7, respectively (Fig. 4c). Furthermore, the critically important residue 
Asp 121 is buried in the ACT2-ACT4 interface, potentially explaining 
why the arginine-bound conformation of CASTORI does not interact 
with GATOR2 (Fig. 4c). 

Together, these results suggest a model in which arginine binding 
arranges the glycine-rich $14 loop in a conformation that enables 
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C278A mutants constitutively bind GATOR2 in cells. HEK-293T cells 
transiently expressing the indicated HA-tagged constructs were starved 

of arginine for 50 min and, where indicated, re-stimulated for 10 min. 
HA-immunoprecipitates were prepared and analysed as in Fig. Ic. 

d, CASTORI ACT 1-2 (residues 1-169) and CASTOR1 ACT3-4 (169-329) 
associate in an arginine- and 814-loop-dependent manner. HEK-293T 
cells transiently expressing the indicated HA-tagged constructs were 
starved of arginine for 60 min and, where indicated, re-stimulated for 

60 min. HA-immunoprecipitates were prepared and analysed as in Fig. 1c. 


the intramolecular association of ACT2 and ACT4 (Fig. 3a—d). The 
association of these domains would alter the position and exposure 
of the residues required for GATOR2 binding, which also lie along 
the ACT2-ACT4 interface (Fig. 4a—c), thereby triggering the disso- 
ciation of CASTOR from GATOR2 and the subsequent activation of 
mTORCI (Fig. 4e). 

The observation that CASTOR inhibits mTORC1 signalling and 
interacts with GATOR2 in an arginine-sensitive manner suggests 
that CASTORI may regulate mTORC1 by inhibiting GATOR2, a 
mechanism analogous to that of the recently identified leucine sensor 
Sestrin2 (refs 16-19). Using our GATOR2-binding-deficient mutants, 
we were able to test this hypothesis directly. In contrast to wild-type 
CASTORI, the GATOR2-binding-deficient YQ118-119AA and 
D121A mutants both failed to inhibit mTORC1 signalling in cells 
(Fig. 4d). Moreover, owing to their ability to dimerize with endogenous 
CASTORI, these mutants also functioned as dominant negatives, ren- 
dering mTORC1 fully resistant to arginine starvation (Fig. 4d). Thus, 
the CASTORI-GATOR2 interaction is required to signal arginine 
deprivation to mTORCI1. 

Although defined by their common topology, ACT domains 
are highly diverse in sequence and form a wide range of structural 
assemblies'*!°, Comparison of our structure with other ACT-domain- 
containing proteins in the Protein Data Bank (PDB) revealed that 
CASTORI shares substantial structural homology with the allosteric 
regulatory domains of bacterial aspartate kinases, including those 
found in Escherichia coli (AKeco) and cyanobacteria (AKsyn)?*”! 
(Fig. 5a and Extended Data Fig. 6a). Aspartate kinases catalyse the 
first step of a metabolic pathway that synthesizes several amino acids, 
including lysine, and display allosteric feedback inhibition when down- 
stream products bind to their regulatory domains”. Notably, AKeco 
binds lysine through pockets that bear a striking resemblance to the 
arginine-binding pocket of CASTOR1”° (Fig. 5b). Furthermore, AKeco 
residues Arg305, Glu346, and Val347, which correspond to the posi- 
tions of the critical GATOR2-binding residues Glu261, Tyr118, and 
Gln119, respectively, participate directly in the lysine-dependent inhi- 
bition of the kinase domain in AKeco” (Extended Data Fig. 6b). Thus, 
the overall structure, mode of amino-acid binding and likely allosteric 
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Figure 4 | The GATOR2 binding site of CASTOR] is at the ACT2- 
ACT4 interface and is required for signalling arginine deprivation to 
mTORC1. a, The CASTORI D292A, E261A, D121A, and YQ118-119AA 
mutants are deficient in GATOR2 binding. HA-immunoprecipitates 
prepared from arginine-starved HEK293T-cells transiently expressing the 
indicated HA-tagged constructs were analysed as in Fig. Ic. b, Solvent- 
exposed surface view of the CASTORI1 homodimer highlighting the 
GATOR2-binding sites (red). Residue E261 is in a partially disordered 
loop and not visible in one monomer (left). c, Cross-sectional view of the 


mechanism of CASTOR] all resemble those found in the regulatory 
domains of prokaryotic aspartate kinases. 

These similarities suggest that CASTORI shares an evolution- 
ary origin with prokaryotic aspartate kinases. Aspartate kinases are 
found throughout the bacteria, archaea, and many eukaryotic lineages, 
but were lost before the emergence of metazoa, whereas CASTORI1 
homologues are present only in metazoa (Fig. 5c). Thus, in order to 
acquire arginine sensitivity in early multicellular animals, the mTORC1 
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ACT2-ACT4 interface showing the positions of the critical GATOR2- 
binding residues relative to the bound arginine (yellow) and the 614 loop. 
d, The GATOR2-binding-deficient YQ118-119AA and D121A mutants of 
CASTORI fail to inhibit the mTORC1 pathway and render cells insensitive 
to arginine starvation. HEK-293T cells were transiently transfected 

with FLAG-S6K1 and the indicated HA-tagged constructs. FLAG- 
immunoprecipitates were prepared and analysed as in Fig. 1d. e, A model 
of how arginine releases CASTORI from GATOR2 to activate mTORCI. 


pathway may have taken advantage of this more ancient, lysine-sensitive 
regulatory mechanism (Fig. 5d). This exploitation of a pre-existing 
allosteric module is analogous to the models proposed for the evolu- 
tion of hormone-receptor signalling”’ and yeast MAP kinases”*, and 
may enable the more rapid incorporation of novel signalling responses 
into existing pathways” 

Together, our results provide a structural basis for arginine sensing by 
the mTORC1 pathway. Furthermore, our data obtained using arginine 
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Figure 5 | Insights into the evolution of arginine sensing by CASTOR1. 
a, Top, a ribbon view of human CASTORI dimer (pink and purple) and 
AKeco dimer (blue and yellow; PDB ID 2JOX). Bottom, a ribbon view of 
the human CASTOR1 monomer (left) and the regulatory domain from 
AKeco (right). b, Comparison of the arginine-binding pocket in human 
CASTORI with the lysine-binding pocket in AKeco. Arginine and lysine 
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are shown in yellow and orange, respectively. Hydrogen bonds and salt 
bridges are shown as black dashed lines. c, Phylogenetic distribution of 
aspartate kinase (orange) and CASTORI homologues (purple). d, Model 
of the evolution of CASTORI from the regulatory domain of an ancestral 
aspartate kinase. 
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analogues suggest that our structure may be useful for predicting com- 
pounds that can modulate arginine sensing by CASTOR1 in vivo. As the 
deregulation of mTORC1 is common in a number of human diseases, 
including cancer”®*”’, the identification of novel pharmacological reg- 
ulators of MTORC1 activity is of particular interest. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 


Received 10 February; accepted 5 July 2016. 
Published online 3 August 2016. 


1. Laplante, M. & Sabatini, D. M. mTOR signaling in growth control and disease. 
Cell 149, 274-293 (2012). 

2. Dibble, C. C. & Manning, B. D. Signal integration by mTORC1 coordinates 
nutrient input with biosynthetic output. Nat. Cell Biol. 15, 555-564 
(2013). 

3. Jewell, J. L., Russell, R. C. & Guan, K. L. Amino acid signalling upstream of 
mTOR. Nat. Rev. Mol. Cell Biol. 14, 133-139 (2013). 

4. Ban, H. et al. Arginine and Leucine regulate p70 S6 kinase and 4E-BP1 in 
intestinal epithelial cells. Int. J. Mol. Med. 13, 537-543 (2004). 

5. Bronte, V. & Zanovello, P. Regulation of immune responses by L-arginine 
metabolism. Nat. Rev. Immunol. 5, 641-654 (2005). 

6. Floyd, J. C., Jr, Fajans, S. S., Conn, J. W., Knopf, R. F. & Rull, J. Stimulation of 
insulin secretion by amino acids. J. Clin. Invest. 45, 1487-1502 (1966). 

7. Yao, K. et al. Dietary arginine supplementation increases mTOR signaling 
activity in skeletal muscle of neonatal pigs. J. Nutr. 138, 867-872 (2008). 

8. Sancak, Y. et al. The Rag GTPases bind raptor and mediate amino acid 
signaling to mMTORC1. Science 320, 1496-1501 (2008). 

9. Bar-Peled, L. et a. A Tumor suppressor complex with GAP activity for the Rag 
GTPases that signal amino acid sufficiency to mTORC1. Science 340, 
1100-1106 (2013). 

0. Wang, S. et al. Metabolism. Lysosomal amino acid transporter SLC38A9 signals 
arginine sufficiency to mMTORC1. Science 347, 188-194 (2015). 

1. Rebsamen, M. et a/. SLC38A9 is a component of the lysosomal amino acid 
sensing machinery that controls mTORC1. Nature 519, 477-481 (2015). 

2. Chantranupong, L. et al. The CASTOR proteins are arginine sensors for the 
mTORC]1 pathway. Cell 165, 153-164 (2016). 

3. Aravind, L. & Koonin, E. V. Gleaning non-trivial structural, functional and 
evolutionary information about proteins by iterative database searches. 

J. Mol. Biol. 287, 1023-1040 (1999). 

4. Grant, G. A. The ACT domain: a small molecule binding domain and its role as 
acommon regulatory element. J. Biol. Chem. 281, 33825-33829 (2006). 

5. Chipman, D. M. & Shaanan, B. The ACT domain family. Curr. Opin. Struct. Biol. 
11, 694-700 (2001). 

6. Chantranupong, L. et al. The Sestrins interact with GATOR2 to negatively 
regulate the amino-acid-sensing pathway upstream of mTORC1. Cell Reports 
9, 1-8 (2014). 

7. Parmigiani, A. et a/. Sestrins inhibit mTORC1 kinase activation through the 
GATOR complex. Cel! Reports 9, 1281-1291 (2014). 

8. Wolfson, R. L. et a/. Sestrin2 is a leucine sensor for the mTORC1 pathway. 
Science 351, 43-48 (2016). 


LETTER 


19. Saxton, R. A. et a/. Structural basis for leucine sensing by the Sestrin2-mTORC1 
pathway. Science 351, 53-58 (2016). 

20. Kotaka, M., Ren, J., Lockyer, M., Hawkins, A. R. & Stammers, D. K. Structures of 
R- and T-state Escherichia coli aspartokinase Ill. Mechanisms of the allosteric 
transition and inhibition by lysine. J. Biol. Chem. 281, 31544-31552 (2006). 

21. Robin, A. Y. et a/. Anew mode of dimerization of allosteric enzymes with ACT 
domains revealed by the crystal structure of the aspartate kinase from 
Cyanobacteria. J. Mol. Biol. 399, 283-293 (2010). 

22. Dumas, R., Cobessi, D., Robin, A. Y., Ferrer, J.-L. & Curien, G. The many faces of 
aspartate kinases. Arch. Biochem. Biophys. 519, 186-193 (2012). 

23. Bridgham, J. T., Carroll, S. M. & Thornton, J. W. Evolution of hormone-receptor 
complexity by molecular exploitation. Science 312, 97-101 (2006). 

24. Coyle, S. M., Flores, J. & Lim, W. A. Exploitation of latent allostery enables the 
evolution of new modes of MAP kinase regulation. Cel/ 154, 875-887 (2013). 

25. Peisajovich, S. G., Garbarino, J. E., Wei, P. & Lim, W. A. Rapid diversification of 
cell signaling phenotypes by modular domain recombination. Science 328, 
368-372 (2010). 

26. Zoncu, R., Efeyan, A. & Sabatini, D. M. mTOR: from growth signal integration to 
cancer, diabetes and ageing. Nat. Rev. Mol. Cell Biol. 12, 21-35 (2011). 

27. Shaw, R. J. & Cantley, L. C. Ras, Pl(3)K and mTOR signalling controls tumour 
cell growth. Nature 441, 424-430 (2006). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements We thank all members of the Sabatini and Schwartz 
laboratories for helpful insights. This work is based on research conducted at 
the Northeastern Collaborative Access Team beamlines, which are funded by 
the National Institute of General Medical Sciences from the National Institutes 
of Health (P41 GM103403). The Pilatus 6M detector on 24-ID-C beam line is 
funded by a NIH-ORIP HEI grant (S10 RRO29205). This research used resources 
of the Advanced Photon Source, a US Department of Energy (DOE) Office 

of Science User Facility operated for the DOE Office of Science by Argonne 
National Laboratory under contract no. DE-ACO2-06CH11357. This work has 
been supported by grants from NIH (RO1CA103866 and Al47389) and the US 
Department of Defense (W81XWH-O7- 0448) to D.M.S. Fellowship support was 
provided by NIH to L.C. (F31 CA180271). D.M.S. is an investigator of the Howard 
Hughes Medical Institute. 


Author Contributions R.A.S., T.U.S., and D.M.S. designed the research plan. 
R.A.S. performed the experiments with assistance from L.C. and K.E.K. on 

experimental design and interpretation. R.A.S., T.U.S., and D.M.S. wrote the 
manuscript and all authors edited it. 


Author Information Coordinates and structure factors for the x-ray crystal 
structure of CASTOR1 have been deposited in the Protein Data Bank (PDB) 

with accession code 512C. Reprints and permissions information is available 

at www.nature.com/reprints. The authors declare competing financial 

interests: details are available in the online version of this paper. Readers are 
welcome to comment on the online version of the paper. Correspondence 

and requests for materials should be addressed to D.M.S. (sabatini@wi.mit.edu) 
or T.U.S. (tus@mit.edu). 


Reviewer Information Nature thanks L. Tong and the other anonymous 
reviewer(s) for their contribution to the peer review of this work. 


11 AUGUST 2016 | VOL 536 | NATURE | 233 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


METHODS 


Materials. Reagents were obtained from the following sources: HRP-labelled 
anti-rabbit secondary antibody from Santa Cruz Biotechnology; antibodies to 
phospho-T389 S6K1, S6K1, Mios and the FLAG epitope from Cell Signalling 
Technology; antibodies to the haemagglutinin epitope from Bethyl laboratories; 
antibody to raptor from Millipore. All antibodies used have been published 
previously!*!°. FLAG-M2 affinity gel and amino acids from Sigma Aldrich; RPMI 
without leucine, arginine, or lysine from Pierce; DMEM from SAFC Biosciences; 
XtremeGene9 and Complete Protease Cocktail from Roche; inactivated fetal calf 
serum (IFS) from Invitrogen; (PH]-labelled arginine from American Radiolabelled 
Chemicals. 

Protein production and purification. Full-length, codon-optimized human 
CASTOR! was N-terminally fused with a human rhinovirus 3C protease-cleavable 
His}o-Args-ScSUMO tag and cloned into a pET-Duet-1 bacterial expression 
vector. This vector was transformed into E. coli LOBSTR (DE3) cells (Kerafast)**. 
Cells were grown at 37°C to 0.6 optical density (OD), then protein production 
was induced with 0.2mM IPTG at 18°C for 12-14h. Cells were collected by 
centrifugation at 6,000g, re-suspended in lysis buffer (50 mM potassium phosphate, 
pH 8.0, 500 mM NaCl, 30 mM imidazole, 3 mM (-mercaptoethanol (8ME) and 
1mM PMSF) and lysed with a cell disruptor (Constant Systems). The lysate was 
cleared by centrifugation at 10,000g for 20 min. The soluble fraction was incu- 
bated with Ni-Sepharose 6 Fast Flow beads (GE Healthcare) for 30 min on ice. 
After washing of the beads with lysis buffer, the protein was eluted in 250 mM 
imidazole, pH 8.0, 150mM NaCl and 3mM BME. The Ni eluate was diluted 1:1 
with 10mM potassium phosphate, pH 8.0, 0.1 mM EDTA and 1 mM dithioth- 
reitol (DTT), and was subjected to cation-exchange chromatography on a 5 ml 
SP sepharose fast flow column (GE Healthcare) with a linear NaCl gradient. The 
eluted CASTORI was then incubated with 3C protease and dialysed overnight at 
4°C into 10mM potassium phosphate, pH 8.0, 150mM NaCl, 0.1mM EDTA and 
1mM DTT, followed by a second cation-exchange chromatography run on an SP 
Sepharose Fast Flow column (GE Healthcare) with a linear NaCl gradient. The 
protein was further purified via size-exclusion chromatography on a Superdex 
$200 16/60 column (GE Healthcare) equilibrated in running buffer (10 mM Tris- 
HCl, pH 8.0, 150mM NaCl, 0.1 mM EDTA and 1mM DTT). Selenomethionine 
(SeMet)-derivatized CASTORI was prepared as described previously”’ and puri- 
fied as the native version, except that the reducing-agent concentration (3ME and 
DTT) was 5 mM in all buffers. 

Crystallization. Purified CASTOR was concentrated to 6 mg/ml and incubated 
in 2mM arginine for >1h before setting crystal trays. Crystals were grown at 
18°C by hanging-drop vapour diffusion with 1 ,1l of protein at 6 mg/ml mixed with 
an equal volume of reservoir solution containing 0.1 M sodium acetate pH 5.0, 
0.25 M ammonium acetate, and 22.5% PEG 3350. Selenomethionine-derivatized 
CASTOR! was crystallized in 0.1 M BIS-TRIS pH 5.6, 0.25 M ammonium acetate, 
and 22.5% PEG3350. Crystals were cryoprotected in mother liquor supplemented 
with 20% (v/v) ethylene glycol. 

Data collection and structure determination. Data collection was performed at 
the Advanced Photon Source end station 24-IDC at Argonne National Laboratory, 
at 100K. All data-processing steps were carried out with programs provided 
through SBgrid*°. Data reduction was performed with HKL2000°!. A complete 
native data set was collected to 1.8 A (at wavelength 0.9792 A) and a complete 
SeMet data set, at the selenium peak wavelength (0.9792 A), was collected to 
2.2 A. The phase problem was solved using single-wavelength anomalous disper- 
sion (SAD) and selenium positions were determined in HySS, run as part of the 
PHENIX AutoSol program’, for the SeMet data set (space group P2), 4 molecules 
per asymmetric unit). An interpretable 2.2 A experimental electron density map 
was obtained, and manual model building was carried out in Coot*®. Subsequent 
refinement was carried out with the superior 1.8 A native data set using phenix. 
refine to a final Rwork/Rfree of 17.2%/20.4%. Ramachandran statistics in the final 
model are 99% favoured, 1% allowed, and 0% outlier. 

Structural analysis. Protein—protein and protein-ligand interfaces were analysed 
using PDBePISA*™. NCBI's Vector Alignment Search Tool (VAST)*° was used to 
identify structurally related proteins in the PDB. The multiple sequence align- 
ment (MSA) was generated in Jalview** with the T-Coffee alignment algorithm”. 
Sequences of CASTOR homologues were obtained via NCBI BLAST searches*. 
All structure figures were made in PyMol*. 

Cell lysis and immunoprecipitation. Cells were rinsed once with ice-cold PBS 
and immediately lysed with Triton lysis buffer (1% Triton, 10mM {-glycerol phos- 
phate, 10 mM pyrophosphate, 40 mM HEPES pH 7.4, 2.5mM MgCl and 1 tablet 
of EDTA-free protease inhibitor (Roche) (per 25 ml buffer). The cell lysates were 
cleared by centrifugation at 13,000 rpm at 4°C in a microcentrifuge for 10 min. 
For anti- HA-immunoprecipitations, the magnetic anti-HA beads (Pierce) were 
washed three times with lysis buffer. 3011 of a 50/50 slurry of the affinity gel was 
then added to clarified cell lysates and incubated with rotation for 1h at 4°C. 


Following immunoprecipitation, the beads were washed four times with lysis buffer 
containing 500 mM NaCl. Immunoprecipitated proteins were denatured by the 
addition of 50 il of sample buffer and boiling for 5 min as described”, resolved by 
8-16% SDS-PAGE, and analysed by immunoblotting. 

For co-transfection experiments in HEK-293T cells, 2.5 million cells were 
plated in 10cm culture dishes. Twenty-four hours later, cells were transfected 
using the polyethylenimine method"! with the pRK5-based cDNA expression 
plasmids indicated in the following amounts: 50 ng CASTORI-HA (wild-type 
or mutant), 50ng CASTORI-FLAG, 11g HA-metap2, or 2 ng SOK. For in vitro 
dissociation experiments, 50 ng of wild-type CASTOR1-HA was transfected into 
HEK-293T cells. The total amount of plasmid DNA in each transfection was 
normalized to 51g with empty pRK5. 36-48 h after transfection, cells were lysed 
as described above. 

For experiments that required amino acid starvation or re-stimulation, cells 
were treated as previously described”. Briefly, cells were incubated in arginine-free 
RPMI for 50 min and then re-stimulated with 500|.M arginine for 10 min. 
Arginine binding assay. Five million HEK-293T cells were plated on a 15cm plate 
four days before the experiment. Twenty-four hours after plating, the cells were 
transfected via the polyethylenimine method with the pRK5-based cDNA expres- 
sion plasmids indicated in the figures in the following amounts: 15 jug FLAG- 
Rap2A, 500 ng FLAG-CASTORI (wild-type or mutant). The total amount of 
plasmid DNA in each transfection was normalized to 151g total DNA with empty 
PRKS. Forty-eight hours after transfection cells were lysed as previously described. 
If multiple samples of the same type were represented in the experiment, the cell 
lysates were combined, mixed, and evenly distributed amongst the relevant tubes. 

Anti-FLAG beads were blocked by rotating in 1 j1g/1l bovine serum albumin 
(BSA) for 20 min at 4°C, then washed twice in lysis buffer and re-suspended in an 
equal volume of lysis buffer. 3011 of bead slurry was added to each of the clarified 
cell lysates and incubated as previously described. After immunoprecipitation, the 
beads were washed as previously and incubated for one hour on ice in cytosolic 
buffer (0.1% Triton, 40 mM HEPES pH 7.4, 10mM NaCl, 150mM KCl, 2.5mM 
MgCl,) with the appropriate amount of [*H]-labelled arginine and cold arginine. 
At the end of one hour, the beads were aspirated dry and rapidly washed three 
times with cytosolic buffer. The beads were aspirated dry again and resuspended 
in 8511 of cytosolic buffer. Each sample was mixed well and three 10,11 aliquots 
were quantified separately using a TriCarb scintillation counter (PerkinElmer). 
This process was repeated in pairs for each sample, to ensure similar incubation 
and wash times for all samples analysed across different experiments. 

In vitro CASTOR1-GATOR2 dissociation assay with arginine analogues. 
HEK-293T were transfected with HA-CASTORI constructs as described above. 
48h after transfection, cells were starved for all amino acids for 50 min, lysed 
and subjected to anti-FLAG immunoprecipitation as described previously. The 
CASTORI1-GATOR2 complexes immobilized on the haemagglutinin beads were 
washed twice in lysis buffer with 500 mM NaCl, then incubated for 20 min in 
1 ml of cytosolic buffer with 400 1M of the indicated compound. The amount of 
GATOR2 and CASTORI that remained bound was assayed by SDS-PAGE and 
immunoblotting as described previously. 

Cell lines and tissue culture. HEK-293T cells were maintained at 37°C and 
5% CO2 and cultured in DMEM 10% IFS supplemented with 2mM glutamine, 
penicillin (1001U/ml) and streptomycin (100}1g/ml). HEK-293T cells were 
obtained from the American Type Culture Collection (ATCC) and were free of 
mycoplasma contamination. 

Statistical analysis. For the arginine-binding assays, two-tailed t-tests were used 
for comparison between two groups. All comparisons were two-sided, and P values 
of less than 0.005 were considered statistically significant. The data meet the 
assumptions of the test and the variance is similar between groups that are being 
statistically compared. No statistical methods were used to predetermine sample 
size. The experiments were not randomized. The investigators were not blinded 
to allocation during experiments and outcome assessment. 
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Extended Data Figure 1 | Multiple sequence alignment of CASTOR1 homologues. a, Expanded Multiple Sequence Alignment of CASTOR1 
homologues from various organisms. Positions are coloured white to blue according to increasing sequence identity. Secondary structure features are 


labelled and coloured by ACT domain as in Fig. la. 
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Extended Data Figure 2 | Dimerization-deficient CASTOR1 mutants 
bind arginine but fail to inhibit mTORC1 in cells. a, The dimerization- 
deficient CASTOR1 Y207S and 1202E mutants bind arginine in vitro. 
FLAG-immunoprecipitates prepared from HEK-293T cells transiently 
expressing indicated FLAG-tagged proteins were used in binding assays 
with [?H] Arginine as described in the Methods. Unlabelled arginine 

was included as a competitor where indicated. Values are mean +s.d. 
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for three technical replicates from one representative experiment. 

b, Dimerization-deficient CASTOR1 Y207S and 1202E mutants fail to 
inhibit mTORC1. HEK-293T cells transiently expressing FLAG-S6K1 

and HA-tagged wild-type, Y207S, or 1202E CASTORI were starved of 
arginine for 50 min and, where indicated, re-stimulated for 10 min. FLAG- 
immunoprecipitates were prepared from lysates and analysed as in Fig. 1c. 
Phospho-S6K1 was used as an indicator of mTORC1 activity. 
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Extended Data Figure 3 | Model of lysine-binding in CASTOR1. of standard hydrogen bonds and salt bridges, respectively. ACT domains 

a, Comparison of the arginine-bound pocket of human CASTOR] with a are labelled as in Fig. 1a. b, Chemical structures of arginine analogues used 
model of the pocket with lysine in place of arginine. Arginine and lysine in Fig. 2e. Differences relative to L-arginine are highlighted in oranges 
stick representations are shown in yellow and orange, respectively. The boxes. 


distances in the lysine-bound model, 3.8 A and 5.0 A, are beyond the range 
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Extended Data Figure 4 | Differences in the arginine-binding capacities 
of CASTOR1 and CASTOR2. a, Multiple sequence alignment of human 
CASTORI and CASTOR2, highlighting differences in amino acid 
sequence that are in close proximity to arginine-binding residues in 
CASTORI. b, The CASTORI HHV108-110QNI mutant constitutively 
binds GATOR2 in cells. HEK-293T cells transiently expressing HA- 
metap2 or the indicated HA-tagged CASTOR constructs were 

starved of arginine for 50 min and, where indicated, re-stimulated for 

10 min. HA-immunoprecipitates were prepared and analysed as in 

Fig. 1c. c, The CASTOR1 HHV108-110QNI mutant displays reduced 
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arginine-binding capacity in vitro. Binding assays were performed with 
the indicated CASTOR1 or CASTOR2 constructs and immunoprecipitates 
analysed as in Fig. 2c. Values are mean + s.d. for three technical replicates 


from one representative experiment. d, Comparison of the CASTORI 
HHV108-110QNI mutant and wild-type CASTOR2. HEK-293T cells 


transiently expressing HA-metap2 or the indicated HA-tagged CASTORI 
or CASTOR2 constructs were starved of arginine for 50 min and, where 


indicated, re-stimulated for 10 min. HA-immunoprecipitates were 
prepared and analysed as in Fig. Ic. 
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Extended Data Figure 5 | GATOR2-binding-deficient CASTOR1 
mutants still bind arginine and homodimerize. a, The CASTOR1 


YQ118-119AA, D121A, E261A and D292A mutants bind arginine in vitro. 


FLAG-immunoprecipitates prepared from HEK-293T cells transiently 
expressing indicated FLAG-tagged proteins were used in binding assays 
with [*H]arginine as described in the Methods. Unlabelled arginine was 
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included as a competitor where indicated. Values are mean + s.d. for 

three technical replicates from one representative experiment. b, The 
CASTORI YQ118-119AA, D121A, E261A and D292A mutants dimerize 
in cells. HA-immunoprecipitates prepared from HEK293T-cells transiently 
expressing CASTOR1-FLAG and HA-metap2 or the indicated HA-tagged 
CASTORI constructs were analysed as in Fig. Ic. 
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Extended Data Figure 6 | Similarities between human CASTOR1 (as in AKeco) or through their kinase domains (AKsyn), both of which are 
and prokaryotic aspartate kinases. a, Ribbon diagram views of distinct from the side-by-side ACT-domain dimerization in CASTORI. 
human CASTORI, AKeco (PDB ID: 2J0x) and AKsyn (PDB ID: 3L76), b, View of AKeco depicting positions of residues R305, E346, and V347, 
highlighting the different modes of dimerization. Aspartate kinases can which correspond to the positions of the GATOR2-interacting residues of 
dimerize through an interlocked-ACT domain conformation CASTORI. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Extended Data Table 1 | Data collection and refinement statistics (SAD) 


Organism 
PDB ID 


Data collection 
Space group 
Cell dimensions 
a, b,c (A) 
a, B, y () 


Wavelength (A) 
Resolution (A) 
Ryym(%) 

I/ol 
Completeness (%) 
Redundancy 


Anomolous Completeness (%) 


Refinement 
Resolution (A) 
No. reflections 
Ryork / Riree 
No. atoms 

Protein 

Arg 

Water 


Average B-factors (A) 


Protein 
Arg 
Water 

R.m.s. deviations 
Bond lengths (A) 
Bond angles (°) 


*Values in parentheses are for highest-resolution shell. 


CASTORI + Arg 
Native 

H. sapiens 

512C 


P2, 


91.39, 82.60, 96.67 
90, 116.23, 90 


0.9792 
86.7 — 1.80 
7.2 (62.8) 
25.9 (1.2) 
97.85 (87.1) 
3 (2.5) 


86.71 — 1.80 
116,883 
17.2%/20.4% 
9,872 

9,012 

48 

796 

40.2 

40.0 

26.8 

46.4 


0.007 
0.85 


CASTOR + Arg 
SeMet 
H. sapiens 


P2, 


91.76, 82.35, 96.71 
90, 116.04, 90 
Peak 

0.9792 

86.89 — 2.20 

10.4 (>100) 

22.6 (1.4) 

98.2 (98.1) 

6.4 (5.9) 

96.8 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


doi:10.1038/nature19080 


Reconstruction of bacterial transcription-coupled 
repair at single-molecule resolution 


Jun Fan!, Mathieu Leroux-Coyau!, Nigel J. Savery? & Terence R. Strick!34 


Escherichia coli Mfd translocase enables transcription-coupled 
repair by displacing RNA polymerase (RNAP) stalled on a DNA 
lesion and then coordinating assembly of the UvrAB(C) components 
at the damage site!~*, Recent studies have shown that after binding 
to and dislodging stalled RNAP, Mfd remains on the DNA in the 
form of a stable, slowly translocating complex with evicted RNAP 
attached*®. Here we find, using a series of single-molecule assays, 
that recruitment of UvrA and UvrAB to Mfd-RNAP arrests the 
translocating complex and causes its dissolution. Correlative 
single-molecule nanomanipulation and fluorescence measurements 
show that dissolution of the complex leads to loss of both RNAP 
and Mfd. Subsequent DNA incision by UvrC is faster than when 
only UvrAB(C) are available, in part because UvrAB binds 
20-200 times more strongly to Mfd-RNAP than to DNA damage. 
These observations provide a quantitative framework for comparing 
complementary DNA repair pathways in vivo. 

The conformational changes that take place in Mfd upon docking to, 
and activation by, stalled RNAP’? enable it to bind to DNA upstream 
of RNAP and translocate along DNA against stalled RNAP!" and 
to expose a UvrB homology module and recruit UvrA%. Remarkably, 
single-molecule assays have shown that after displacing stalled RNAP 
to make the lesion accessible for repair, Mfd continues to translocate 


slowly and processively with RNAP attached to it®®. These assays help 


explain recent results showing that transcription-coupled repair (TCR) 
can also accelerate repair of damaged sites downstream of the stall 
site!*, Nevertheless, the role of the translocating Mfd-RNAP complex 
in stimulating repair by UvrAB(C) remains unclear. Here three 
single-molecule assays based on magnetic trapping’ are brought to 
bear on the system. 

In the tethered-RNAP translocation assay we stalled biotinylated 
RNAP after transcribing 20 bases using only ATP, UTP and GTP 
on a DNA cassette lacking cytosine residues. We then tethered the 
RNAP to a streptavidin-coated magnetic bead, and anchored the 
linear DNA template at one end to a modified glass coverslip. We thus 
obtained an RNAP stalled ~1 kilobase pair (kbp) from one end of an 
~8 kbp DNA as it transcribed towards the distant glass surface’* 
(Fig. 1a). The DNA was extended away from the surface by a vertical 
force (F=1 pN) applied to the bead using a pair of magnets located 
above the sample, and the bead’s position above the surface was 
detected in real time using computer-aided videomicroscopy. Addition 
of 100nM Mfd and 2mM ATP caused motion of the bead towards the 
surface as an Mfd-~RNAP complex formed and translocated along 
the DNA (Fig. 1b)*°. The complex was Michaelian with respect 


to ATP, with maximum rate V4), =4.7 £0.1 bp s_! (s.e.m.) and 


Figure 1 | Tethered-RNAP assay for resolution of 
the Mfd-RNAP complex. a, UvrA(B) intercepts, 
arrests and releases translocating Mfd-RNAP- 
bead complexes (see text). b, Time-traces of bead 
position in the presence of 2mM ATP and proteins 
as indicated. Arrows indicate component infusion 
(gaps). ¢, Single-exponential lifetime distribution 
of arrest events in the presence of UvrA displays a 
mean of 15+3s (s.e.m., 7 =65). Inset time-trace 
shows Mfd-RNAP arrest (black bar) and release 
(red up-arrow) for 10s averaging. Dashed line, 
linear fit to translocation. d, As in c but in the 
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Table 1 | Statistics of release of the Mfd-RNAP complex in the 
translocation assay 


Release before Dissociation by — Efficiency of 


surface surface collision release (%) 
Pausing No pausing 
detected detected 
Mfd-RNAP alone (e) 33 66 33. 
+UvrA 65 67 43 75 
+UvrAB 63 208 20 93 
+UvrB e) 5 65 7 


Michaelis constant | oie =16+0.4 uM (s.e.m.) (Extended Data 
Fig. 1). In ~50% of cases the bead translocated ~7,000 bp under 
the action of Mfd and was released only upon collision with the sur- 
face; in the remaining cases it released before reaching the surface 
(Table 1). 

Addition of 50 pM UvrA to translocating Mfd-RNAP (Fig. 1b) led 
to release of 75% of beads before reaching the surface (Table 1). In 
50% of release events we observed arrest of the translocating complex 
before release (65/132; Table 1). Arrest duration was well described by 
single-exponential kinetics with a mean duration of 15+3s (s.e.m.; 
Fig. 1c). Accordingly, arrest most probably occurred in the remaining 
50% of events but was too short for us to detect with a slow meas- 
urement response time of ~10s (see Supplementary Information: 
Methods). Addition of 50 pM UvrA and 250 nM UvsB to the trans- 
locating complex led to release of 93% of beads (Fig. 1b and Table 1). 
Arrest now lasted on average only 6 + Is (s.e.m.; Fig. 1d), and was 
observed in only ~20% of cases (Table 1). As above, shorter periods 
of arrest were predicted to have taken place in all cases. UvrB alone 
failed to destabilize the translocating complex (Fig. 1b and Table 1). 
ATP hydrolysis was required for either UvrA or UvrAB to per- 
form these tasks (Extended Data Fig. 2). These results indicate 
that complex resolution by UvrAB is faster and more efficient than 
by UvrA alone, and that arrest of the complex is on-pathway to its 
disassembly. 

We next used a tethered-DNA assay in which a 2kbp subfragment 
of the DNA used previously was attached to a magnetic bead and a 
coverslip, and extended (F = 0.3 pN) and supercoiled in the magnetic 
trap (Fig. 2a)'>!°, Fig. 2b shows the extension signal obtained when 
RNAP initiated transcription and stalled on a positively supercoiled 
DNA substrate bearing a cyclobutane pyrimidine dimer (CPD), and 
was displaced by Mfd to form the long-lived translocating com- 
plex (denoted intermediate, I; Fig. 2b, Extended Data Fig. 3 and 
ref. 5). Intermediate lifetimes were long and normally distributed 
(mean of 548 + 37s s.e.m.), and independent of supercoiling or cause of 
stalling (Extended Data Fig. 3), but depended on the distance between 
stalled RNAP and the end of the DNA (compare Extended Data Figs. 3 
and 4). Addition of 50 pM UvrA and 250nM UvsB reduced the mean 
lifetime of the Mfd-RNAP repair intermediate (Fig. 2c) to 141+20s 
(s.e.m.) in a manner that was essentially unaffected by supercoiling and 
cause of stalling (Extended Data Figs 4 and 5). The lifetime distribution 
of the intermediate species then followed a difference-of-exponentials 
function characteristic of a Michaelis-Menten process with association/ 
dissociation of UvrAB to translocating Mfd-RNAP (rates k, and k_; 
respectively) and a slow forward catalytic rate for resolution of the 
complex kp. 

By titrating UvrA from 50 to 100 pM against a saturating concentra- 
tion of 250nM UvrB, we observed a gradual reduction in the mean 
lifetime of the Mfd-RNAP-DNA complex as expected for a 
diffusion-limited process (see Fig. 2d and Extended Data Fig. 6). By 
determining the mean lifetimes of intermediates for different 
UvrA concentrations and fitting those average values to a Michaelis— 
Menten model, we obtained KU =96+51 pM (s.e.m.) and 


hfs 0.023 +0.006s~! (1/VU"48 — 43 + 12s (s.e.m.)) for dissolution 


max 


of the Mfd-RNAP complex by UvrAB (Fig. 2d). By globally fitting the 
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Figure 2 | Tethered-DNA assay for resolution of the Mfd-RNAP complex. 
a, Transcription complexes can be monitored as the positively supercoiled 
DNA couples local torsional deformation by RNAP into large-scale looping 
(writhe) deformation’. b, Left: tethered-DNA time-trace from CPD-bearing 
DNA in the presence of RNAP, Mfd, GreB, UTP, GTP, CTP and 2mM ATP 
shows formation of initially transcribing complex (RPitc), stalled elongation 
complex (RDe) and Mfd-RNAP repair intermediate (I, underscored by 
black bar) which resolves to baseline. Right: lifetime of intermediate follows 
a Gaussian distribution (red line; n = 28). c, As in b but adding UvrA and 
UvrB (n =58). Right: red line is the predicted distribution based on kinetic 
constants (main text). d, Intermediate lifetime plotted as a function of 
inverse concentration of UvrA, obtained as in c but for RNAP stalled ona 
positively supercoiled cytosine (C)-less cassette. 


full lifetime distributions of Mfd-RNAP intermediates obtained 
at different UvrA concentrations to the single-molecule limit for 
the Michaelis-Menten equation’®, using the maximum velocity 
obtained above as a global constraint (Extended Data Fig. 6), we 
estimated the on- and off-rates of the UvrAB complex with respect 
to the Mfd-RNAP intermediate as k; =7.3+1.9 x 108M~!s7}, and 
k_,;=0.037 £0.013s7!, giving a value ro oh of about 80 pM, in 
agreement with the estimate from average values. Control experiments 
showed that removing Mfd, ATP or UvrA abolishes complex dissolu- 
tion (Extended Data Fig. 7). UvrB could not be removed, however, as 
UvrA, on its own, compacted DNA ina non-specific manner at 
concentrations as low as 10 pM, precluding its analysis unbuffered by 
UvrB in this assay (Extended Data Fig. 8). Importantly, DNA compac- 
tion by 100 pM UvrA was abolished by addition of 250nM UvrB 
(Extended Data Fig. 8). 

To further determine the fate of the Mfd-RNAP complex upon 
arrest and dissolution by UvrAB we used NanoCOSM¢%, a recently 
developed assay enabling correlative nanomanipulation and fluo- 
rescence co-localization of single molecules. By tracking the Mfd- 
RNAP intermediate via its mechanical signature on DNA as seen 
in the topological assay, and simultaneously using single-molecule 
fluorescence to identify the co-localization of labelled components, 
we could monitor the composition of the repair complex as it pro- 
gressed through TCR. Fluorescently labelled RNAP appeared in the 
fluorescence channel when transcription initiation was observed 
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Figure 3 | Correlative single-molecule analysis of Mfd-RNAP handoff 
to UvrAB, and downstream DNA incision by UvrC. Time-traces showing 
simultaneous tethered-DNA assay of Mfd-RNAP repair intermediate 
formation and resolution by UvrAB, and single-molecule fluorescence 
signals from experiments in which (a) RNAP (n= 14) or (b) Mfd (n=21) 
were fluorescently labelled (Fluo-RNAP or Fluo-Mfd°). The traces 
presented were obtained on C-less cassette DNA. c, Tethered-DNA time- 
trace for the TCR incision assay performed on positively supercoiled, 
CPD-bearing DNA. Down-arrows: stalled RNAP at lesion; pre- 
equilibrate with UvrA, UvrB, UvrC, pUC18 DNA, GreB and nucleoside 
5’-triphosphates (NTPs); add 100nM Mfd and 1mM ATP (see text and 
Supplementary Information). Incision time distribution and exponential 
fits for (d, e) TCR and (f, g) GGR on positively or negatively supercoiled 
CPD-containing DNA, respectively. 


nanomechanically and was lost from that channel upon nanome- 
chanical dissolution by UvrAB of the repair intermediate (Fig. 3a and 
Extended Data Fig. 9a). Similarly, fluorescently labelled Mfd appeared 
in the fluorescence channel upon formation of the repair interme- 
diate, and was lost from that channel upon dissolution by UvrAB of 
the repair intermediate (Fig. 3b and Extended Data Fig. 9b). Thus 
dissolution of the stable Mfd-RNAP intermediate by UvrAB involves 
loss of both Mfd and RNAP, indicating they do not act in downstream 
steps of DNA repair. 

We finally used the tethered-DNA assay to measure TCR incision 
rates. DNA incision resulted in an abrupt loss of supercoiling and 
was readily detectable as a sudden increase in end-to-end extension. 
TCR incision was obtained by first stalling RNAP on a CPD, then pre- 
equilibrating the cell with UvrAB(C), and finally rapidly adding Mfd to 
the system (see Fig. 3c). We measured fincision, the time elapsed between 
remodelling of stalled RNAP by Mfd and DNA incision by UvrC. 
Incision times for positively and negatively supercoiled DNA followed 
single-exponential distributions with mean lifetimes of 380 + 120s 
(s.e.m., 1 =44 events) and 390 + 70s (s.e.m., 1 =59 events), respec- 
tively (Fig. 3d, e). This is significantly faster than incision rates of the 
CPD substrate in the presence of only UvrABC, as in the case of global 
genome repair (GGR) (1,230 + 195s (s.e.m.), n= 72, and 1,156 + 256s 
(s.e.m.), 1 = 40, for positive and negative supercoiling, respectively; 
see Fig. 3f, g and Extended Data Fig. 10 for experiments and controls 
removing ATP, UvrAB, the CPD or in which CPD is protected by 
RNAP). Our observation of an enhanced repair rate in this assay, even 
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the rate at which UvrAB finds the lesion. UvrB-DNA can then recruit 
UvrC, which incises the lesion site. 


after accounting for GGR rates, is consistent with previous biochemical 
findings!”'”. 

TCR differs from GGR in the mode of recruitment of UvrAB to the 
lesion. UvrAB on its own displays reasonable affinity for DNA damage 
(dissociation constant in the 1-10nM range’), Here we have shown that 
the dissociation constant of UvrAB from the activated Mfd-RNAP com- 
plex is 20-200 times smaller (k_,/k, =50 pM). This is partly due to the 
extremely efficient docking of UvrAB to the exposed UvrB homology 
module of the Mfd-RNAP complex (k, 7 x 10°M s“!, essentially 
diffusion-limited). This efficiency can be explained by the fact that 
the UvrB homology module is larger and more accessible and thus 
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easier to ‘find’ than DNA damage. In this manner, GGR components 
may actively participate in repair even in uninduced, non-SOS condi- 
tions, where abundance of UvrA is extremely low—in the 20 nM range: 
that is, only about ten dimers per cell'?”°—and therefore also subject 
to large fluctuations. The TCR pathway we detail therefore appears 
to be most relevant to ‘housekeeping’ DNA repair, watchfully main- 
taining genomic integrity even in the absence of stressful or genotoxic 
conditions. 

Our observations suggest the existence of a transient UvrB-UvrA- 
UvrA-Mfd-RNAP repair complex, which would convert into a UvrB- 
UvrA-UvrA complex after loss of Mfd and RNAP (see model in Fig. 4). 
Intriguingly, this complex is able to drive repair only of the transcribed 
strand of DNA—the hallmark of TCR’. This suggests the complex, 
once loaded onto the DNA, does not ‘pick up’ a second UvrB. The 
single-molecule methods used here provide us not only with both 
broad and detailed views of TCR, but also with the opportunity to 
pursue these advanced mechanistic questions. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

DNA constructs. Nanomanipulation constructs bearing the T5 N25 promoter! 
followed by either a C-less cassette (with the first C on the coding strand located 
at position +20 from the transcription start site, or TSS) or a CPD (at position 
+20 from the transcription start site, or TSS, and on the transcribed strand) were 
constructed as described previously*®, but with one modification. Specifically, 
enzyme reactions used to prepare the constructs (restriction and ligation) were 
not heat-inactivated, but rather were purified of protein by use of a spin column 
for DNA purification (Macherey-Nagel). The transcription unit is 5/-TTGCTTT 
CAGGAAAATTTTTCTGTATAATAAGCTTATAAATTTGAGAGAGGAGAC 
CAAATATGGCTGGTTCTCCACTAGTTCCGAATAG-3’, where the —35 and 
—10 promoter elements are underlined, the +1 TSS is in bold type, and the first C 
on which RNAP stalls at +20 is in bold type and underlined. 

DNA nanomanipulation constructs used for tethered-RNAP translocation 
assays (as Fig. 1) were ~8 kbp long and contained the transcription unit at the 
DNA end distal to the surface and oriented in such a way as to direct initial tran- 
scription towards the surface®. 

DNA nanomanipulation constructs used for the tethered-DNA supercoiling 
assays (as Fig. 2) were all 2 kbp long and contained the transcription unit as a cen- 
trally located cassette, except for experiments in Extended Data Fig. 4 employing 
negatively supercoiled DNA bearing a C-less cassette, in which case a shortened, 
1kbp long DNA construct containing the transcription unit as a centrally located 
cassette was used for its enhanced spatial resolution>®>”. 

DNA nanomanipulation constructs used for NanoCOSM assays (as Fig. 3) were 
all 3 kbp long and contained the transcription unit located ~400 bp from the cov- 
erslip surface and oriented so as to direct initial transcription towards the surface®. 
Proteins. E. coli RNAP, o”°, GreB and Mfd, as well as SNAP-tagged RNAP 
(SNAP-RNAP) and SNAP-tagged Mfd (SNAP-Mfd) were purified as previously 
described®*. Core RNAP was saturated with a threefold excess of o”° to maintain 
polymerase in the holoenzyme form. SNAP-tagged proteins were labelled with 
BG-DY549 dye (New England Biolabs) as previously described®. UvrA, UvrB and 
UvrC proteins were purified via nickel-affinity chromatography as previously 
described”, with the following modifications. 

UvrA purified via nickel-affinity chromatography was then diluted sixfold with 
heparin buffer A (10 mM Tris-Cl pH 7.5, 50mM KCl, 1mM EDTA, 1mM DTT, 
5% glycerol) and loaded onto 10 ml of heparin resin (HiTrap Heparin, GE 
Healthcare) equilibrated in heparin buffer A. UvrA was eluted from the heparin 
resin by developing a gradient to 1 M KCl, concentrated as necessary to ~5 ml 
(10,000 MWCO Vivaspin 20, GE Healthcare) and then gel-filtrated (Superdex 
HiLoad 200 16/60, GE Healthcare) in GF buffer (10 mM TrisCl pH 8, 200 mM KCl, 
1mM EDTA, 2mM DTT, 5% glycerol) before overnight dialysis into GF buffer 
containing 50% glycerol, aliquoting and snap-freezing in LN». 

UvrB purified via nickel-affinity chromatography was similarly diluted 
sevenfold with heparin buffer A to a conductivity of ~6 mS cm! before being 
loaded onto 10 ml of heparin resin equilibrated in heparin buffer A and eluted by 
developing a gradient to 1 M KCL. Peak fractions were diluted to a conductivity 
of ~6mS cm! using heparin buffer A, loaded onto 1 ml of anion exchange resin 
(MonoQ 5/50 GL, GE Healthcare) equilibrated in heparin buffer A, and eluted by 
developing a gradient to 1 M KCl. Peak fractions were pooled and concentrated 
as necessary to ~5 ml (10,000 MWCO Vivaspin 20, GE Healthcare) and then 
gel-filtrated (Superdex HiLoad 200 16/60, GE Healthcare) in GF buffer before 
overnight dialysis into GF buffer containing 50% glycerol, aliquoting and 
snap-freezing in LN». 

UvrC purified via nickel-affinity chromatography was concentrated as 
necessary to ~5 ml (10,000 MWCO Vivaspin 20, GE Healthcare) and gel-filtrated 
(Superdex HiLoad 200 16/60, GE Healthcare) in GF buffer UvrC (10 mM 
TrisCl pH 7.5, 350mM KCl, 1mM EDTA, 5% glycerol, 2mM DTT) before 
overnight dialysis into GF buffer UvrC containing 50% glycerol, aliquoting and 
snap-freezing in LN». 

Protein concentrations were determined using the Folin-Lowry assay. Protein 
preparations were free of non-specific nuclease activity as determined by the 
absence of incision of supercoiled plasmid DNA in overnight reactions at 37°C in 
standard repair buffer (RB; see below and refs 5, 6) and in large excess of protein. 
The fully reconstituted UvrABC system was observed to specifically nick ultravi- 
olet-irradiated plasmid DNA as expected. 

Glass surfaces. Surfaces used in single-molecule nanomanipulation assays were 
derivatized with anti-digoxigenin”, and surfaces used in correlative NanoCOSM 
assays were derivatized with streptavidin®. 

Reaction conditions. Experiments were performed at 34°C in repair buffer (RB) 
containing 40 mM K-HEPES pH 8.0, 100 mM KCl, 8mM MgCh, 0.5mg/ml BSA, 


0.1% w/v Tween 20, and 10mM 8-mercaptoethanol (adapted from refs 15, 24). 
Unless specified otherwise, concentrations of components if present in reactions 
were as follows: 10-25 pM RNAP holoenzyme, 50 pM UvrA, 250nM UvrB, 
100 pM UvrC, 100 pM pUC18 competitor DNA, 500nM GreB, 100nM Mfd, 
2mM ATP and 200\.M each of UTP and GTP (for DNA bearing a C-less cassette) 
or 200|1M each of UTP, GTP and CTP (for DNA bearing a CPD). NanoCOSM 
assays were conducted at 27°C. Collecting the specified number of events, n, 
typically requires more than three experimental runs involving usually 5-20 DNA 
molecules simultaneously (technical replicates). 

Tethered-RNAP assays. Tethered-RNAP translocation assays (as Fig. 1) were per- 
formed as described previously*. In these assays a biotinylated RNAP is loaded 
at one end of an ~8kbp DNA construct using the T5 N25 promoter, and stalled 
using the C-less cassette located at +20 from the TSS. The stalled RNAP-DNA 
construct is then bound to the standard streptavidin-coated magnetic beads used 
in all these assays (MyOne Streptavidin C1, Life Technologies). When deposited 
on an anti-digoxigenin-coated glass surface, the end of the ~8 kbp construct distal 
to the stalled RNAP, and which bears multiple digoxigenin groups, binds to the 
surface. The DNA is gently extended away from the surface using a low force 
(F=1 pN). Displacement of stalled RNAP and formation of the translocating 
Mfd-RNAP complex is initiated by adding Mfd (100 nM) and ATP (2mM). 
Tethered-DNA assays. Tethered-DNA nanomanipulation experiments (as Fig. 2) 
were performed according to the procedures detailed in refs 15, 21, 22, 25. Standard 
reactions contained the following: extended, supercoiled DNA (F=0.3 pN, super- 
helical density c= 0.021 for experiments using positive or negative supercoiled 
DNA, respectively), 10-25 pM RNAP holoenzyme, 500 nM GreB, 100nM Mfd, and 
2mM ATP and 200\.M each of UTP and GTP (for DNA bearing a C-less cassette) 
or 2mM ATP and 200M each of UTP, GTP and CTP (for DNA bearing a CPD). 

Two methodologies—pulse-chase and continuous tracking—were used for 

these measurements. As previously shown*!> the methodologies are absolutely 
equivalent in terms of quantitative analysis and simply represent optimizations of 
experiments that can have characteristically long or short timescales, respectively, 
as detailed below. 
Pulse-chase methodology. Single-round ‘pulse-chase’ assays”, in which a single 
RNAP is stalled on DNA before free components are washed out and downstream 
components are flowed in, are optimal for the observation of long-lived repair 
intermediates, which must not be interrupted by reloading of anew RNAP mole- 
cule. Thus time-traces from pulse-chase experiments typically display a gap in the 
tracking data corresponding to the moment of component injection. 

To stall RNAP on DNA we first injected 25 pM RNAP holoenzyme, 
500 nM GreB and 200\1M nucleotides (ATP, UTP and GTP for experiments on 
DNA bearing a C-less cassette, and all four nucleotides for experiments on DNA 
bearing a CPD). Upon loading and stalling of RNAP, we wash out free RNAP 
holoenzyme while maintaining GreB and NTPs in solution. We then add 100nM 
Mfd and 2mM ATP to the reaction chamber to allow the reaction to begin with 
displacement of stalled RNAP. 

This methodology was typically used to generate the quantitative kinetic data 

in which UvrAB are absent: Fig. 2b (time distribution) and Extended Data Figs 3 
and 7c. This methodology was also used to preload RNAP for TCR incision rate 
measurements presented in Fig. 3c-e, in which case UvrA, UvrB and UvrC com- 
ponents were added as described below. 
Continuous-tracking methodology. Continuous tracking assays”, in which all 
components are simultaneously present in solution, are optimal for the statistical 
observation of short-lived repair intermediates which are only rarely interrupted 
by reloading of a new RNAP molecule. Here we injected 10-25 pM RNAP holo- 
enzyme, 500nM GreB, 2mM ATP and 200M nucleotides (UTP and GTP for 
experiments performed using the C-less cassette, and UTP, GTP and CTP for 
experiments performed using a CPD), and UvrA and UvrB as specified. 

This methodology was typically used to generate uninterrupted time-traces 
for presentation such as in Fig. 2b, c (time-traces) as well as the quantitative data 
shown in Figs 2c, d and 3a, b and Extended Data Figs 4-6 and 9. 

NanoCOSM assays. NanoCOSM analysis is described in detail in ref. 6. Briefly, it is 
based on tethered-DNA experiments performed using a magnetic trap microscope 
into which a total internal reflection, or evanescent, field has been introduced”°. 
For these assays combining topological measurement on torsionally constrained 
DNA and single-molecule fluorescence, a slightly longer DNA backbone (3 kbp) 
is employed to prevent the autofluorescent bead from entering too much into the 
evanescent field used to excite the fluorophore label. The transcription cassette is 
unchanged. DNA superhelical density is held constant at |o| = 0.021, but for this 
longer DNA this corresponds to +6 turns of the DNA in the magnetic trap rather 
than +4 employed for 2kbp DNA substrates. In these experiments, one fluores- 
cent component was tested at a time. When used, SNAP-RNAP was at 50 pM and 
SNAP-Mfd was at 2.5nM. All components other than the fluorescent one were 
present at the same concentrations as for standard tethered-DNA assays described 
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above, except that GreB was omitted and the bead and surface chemistries were 
inverted (that is, streptavidin-modified glass surface and anti-digoxigenin-coated 
magnetic bead®). NanoCOSM assays were performed using the continuous- 
tracking methodology. 
UvrABC (GGR) incision assay. Incision by UvrABC of positively or negatively 
supercoiled DNA bearing either a C-less cassette or a CPD was performed 
in the presence of 1nM UvrA, 250nM UvrB, 100 pM UvrC, 2mM ATP and 
100 pM of competitor DNA (pUC18, used to reduce non-specific interactions 
between protein components and DNA). 

Incision by UvrABC of positively supercoiled DNA bearing a CPD protected by 
a stalled RNAP was performed by first stalling RNAP on nanomanipulated DNA 
(pulse-chase methodology as above) and then flowing in 1 nM UvrA, 250nM UvrB, 
100 pM UvrC, 2mM ATP and 100 pM of competitor DNA. 
Mfd-RNAP-UvrABC (TCR) incision assay. The TCR assay for positively or 
negatively supercoiled DNA bearing a CPD was performed using the single-round 
pulse-chase methodology described above. First, we stalled RNAP on the CPD 
by equilibrating the reaction cell with 20 pM RNAP, 200|.M of each of the four 
nucleotides, and 500 nM GreB. We then washed out free components while 
maintaining GreB and NTPs in solution. We then supplemented the reaction 
chamber with 1nM UvrA, 250nM UvrB, 100 pM UvrC, 2mM ATP and 
100 pM pUC18 competitor DNA, and then further supplemented the reaction 
chamber with 100nM Mfd to displace stalled RNAP and initiate the TCR reaction. 
Data acquisition and analysis. Nanomanipulation data were collected on 
homebuilt magnetic traps running the Picotwist software suite for trap control 
and particle tracking and analysis (PicoTwist). Raw nanomanipulation data 
representing the magnetic bead position as observed under red illumination 
(650 nm) were collected at video rate (31 Hz, green points in time-traces) using a 
JAI CCD camera, and were filtered at ~1s for analysis (red line in time-traces). 
Fluorescence data co-localized to the magnetic beads were collected using the 
Solis software suite provided by the EMCCD manufacturer (Andor) under 
532 nm strobed illumination conditions (0.5 s illumination every 5s), and were 
synchronized to the nanomanipulation data using dedicated timing trigger pulses 
generated by the programmable counters of the CCD camera. 
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Estimation of the fraction of Mfd-RNAP arrest events too short to be observed. 
To estimate the fraction of UvrA- or UvrAB-mediated Mfd-RNAP arrest events 
that are too short to observe, we first characterize the instrument response time. 
Thus the magnetic bead’s vertical RMS fluctuations in the tethered-RNAP assay, 
where the extending force F~ 1 pN, are ~15 nm. Given that the velocity of the 
Mfd-RNAP complex is only about 1.5nm s~!, we find an instrument response 
time of approximately 10s. As events shorter than this may not be detectable, we 
estimate that the fraction of events that follow a single-exponential distribution 
with mean of 15s, but have a duration shorter than 10s, is of order 50%. Similarly, 
we estimate that the fraction of events that follow a single-exponential distribution 
with a mean of 6s, but have a duration shorter than 10s, is of order 80%. These 
values are in good agreement with observed fractions of missing arrest events, 
leading us to conclude that UvrA or UvrAB always arrest the translocating complex 
before dissociating it from the DNA. 


22. Revyakin, A., Ebright, R. H. & Strick, T. R. Single-molecule DNA 
nanomanipulation: improved resolution through use of shorter DNA 
fragments. Nature Methods 2, 127-138 (2005). 

23. Manelyte, L. et a/. The unstructured C-terminal extension of UvrD interacts with 
UvrB, but is dispensable for nucleotide excision repair. DNA Repair 8, 
1300-1310 (2009). 

24. Smith, A. J. & Savery, N. J. RNA polymerase mutants defective in the 
initiation of transcription-coupled DNA repair. Nucleic Acids Res. 33, 
755-764 (2005). 

25. Revyakin, A., Allemand, J. F., Croquette, V., Ebright, R. H. & Strick, T. R. 
Single-molecule DNA nanomanipulation: detection of promoter-unwinding 
events by RNA polymerase. Methods Enzymol. 370, 577-598 (2003). 

26. Duboc, C., Graves, E. T. & Strick, T. R. Simple calibration of TIR field depth using 
the supercoiling response of DNA. Methods 105, 56-61 (2016). 

27. Verhoeven, E. E. A, Wyman, C., Moolenaar, G. F., Hoeijmakers, J. H. J. & 

Goosen, N. Architecture of nucleotide excision repair complexes: DNA is 
wrapped by UvrB before and after damage recognition. EMBO J. 20, 
601-611 (2001). 

28. van den Broek, B., Noom, M. C. & Wuite, G. J. L. DNA-tension dependence of 
restriction enzyme activity reveals mechanochemical properties of the reaction 
pathway. Nucleic Acids Res. 33, 2676-2684 (2005). 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


A Translocation velocity B 


adh 
(j=) 


2 
©o 


1/velocity (s/bp) 
oO oO 
& (o>) 


0.2 
0 
0 2 4 6 8 10 0 0.05 0.1 0.15 0.2 
velocity (bp/s) 1/[ATP] (uM) 

Extended Data Figure 1 | Motor properties of the translocating distribution is fitted to a Gaussian, giving a mean velocity of 4.7+1bps~! 
Mfd-RNAP complex as seen in the tethered-RNAP assay. a, Velocity (SD; n= 99 trajectories). b, Tau plot of inverse velocity of translocating 
distribution of translocating Mfd-RNAP in the presence of2mM ATP and Mfd-RNAP asa function of inverse ATP concentration is well-fitted to a 
under a weak opposing load (F= 1 pN). Velocity is measured by fitting a line, indicating Michaelian behaviour with KAT — 16 + 0.4M (s.e.m.) 
single line segment to an entire translocation time-trace; this is made and VA"? — 4.7 + 0.1 bp s~! (s.e.m.). Error bars, s.e.m. as determined from 
possible by the fact that velocity is essentially constant over the ~7,000 bp at least ten trajectories for each ATP concentration. 


of displacement which constitutes an entire trajectory. The velocity 
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Extended Data Figure 2 | Resolution of the translocating Mfd-RNAP 
complex by UvrA or by UvrAB is ATP-dependent as shown by the 
tethered-RNAP translocation assay. Down-arrows indicate addition of 
components as noted and as follows. Beginning with stalled RNAP, we 
add 100 nM Mfd and 2mM ATP to form the translocating Mfd-RNAP 
complex. A wash step using 5 ml of reaction buffer lacking ATP is applied 
to remove (nearly) all the free ATP in solution, causing the translocating 
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complex to come to a nearly complete halt. Then, (a, b) 50 pM UvrA or 

(c, d) 50 pM UvrA and 250 nM UvrB is added to the experiment. The 
complex is stable and release of the magnetic bead is not observed. Further 
addition of ATP-y-S (2 mM; see b, d) does not permit bead release. 
However, final addition of ATP (2 mM) leads to rapid release. 

Red up-arrows indicate bead release in b, d. 
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Extended Data Figure 3 | See next page for caption. 
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Extended Data Figure 3 | Characterization of the long-lived 
Mfd-RNAP intermediate on 2kb DNA using the tethered-DNA assay. 
a, b, Nanomanipulation time-traces showing pulse-chase measurement of 
the lifetime of the Mfd-RNAP intermediate for CPD-bearing DNA under 
conditions of positive (+-sc) and negative (—sc) supercoiling, respectively. 
Down-arrows indicate moments of component addition as noted and 

as follows. First, we load RNAP onto DNA in standard conditions 

(25 pM RNAP holoenzyme, 500 nM GreB, and the appropriate nucleotide 
complement, each present at 200 1M). We then wash out free RNAP 

with reaction buffer supplemented with 500 nM GreB and the nucleotide 
complement. We then initiate formation of the intermediate by infusion 
of the above wash solution supplemented with 100 nM Mfd and 2mM ATP. 
For negatively supercoiled DNA, RNAP was loaded under conditions 

of positive supercoiling before the DNA was returned to negative 
supercoiling; blue line indicates when DNA supercoiling is changed. 
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Black bar highlights the intermediate state. c, d, Lifetime distributions 
for the Mfd-RNAP intermediate formed on CPD-bearing DNA under 
conditions of positive or negative supercoiling, respectively, are well fitted 
to Gaussian distributions (red lines). For positive supercoiling the mean 
lifetime of the repair intermediate is 548 + 37 s (s.e.m., n = 29 events: this 
distribution is also presented in Fig. 2b), and for negative supercoiling 
the mean lifetime of the repair intermediate is 556 + 33s (s.e.m.,n=21 
events). e, f, Lifetime distributions for the Mfd-RNAP intermediate 
formed on C-less cassette DNA under conditions of positive or negative 
supercoiling, respectively, are also well fitted to Gaussian distributions. 
For positive supercoiling the mean lifetime of the repair intermediate is 
649 + 13s (n= 98 events) and for negative supercoiling the mean lifetime 
of the repair intermediate is 524 + 26s (n= 46 events). Data were fitted 
over the range delimited by the red lines. 
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Extended Data Figure 4 | Characterization of the Mfd-RNAP 
intermediate on negatively supercoiled 1 kb DNA in the absence or 
presence of UvrAB using the tethered-DNA assay. DNA bears a C-less 
cassette. a, Nanomanipulation time-trace obtained in continuous-tracking 
mode in the presence of 25 pM RNAP holoenzyme, 100nM Mfd, 

500 nM GreB, 2mM ATP, 200,.M UTP and 200,.M GTP. b, As in a but in the 
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added presence of 50 pM UvrA and 250nM UvrB. c, Lifetime distribution 
of the Mfd-RNAP intermediate in the absence of UvrA and UvrB has a 
mean lifetime of 258 + 17s (s.e.m., n = 33 events). d, Lifetime distribution 
of the Mfd-RNAP intermediate in the added presence of 50 pM UvrA and 
250nM UvrB has a mean lifetime of 167 + 17s (s.e.m., n = 32 events). Data 
sets for kinetics were obtained using the pulse-chase methodology. 
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Extended Data Figure 5 | Reduced lifetime of the Mfd-RNAP 
intermediate in the presence of UvrAB, monitored on 2 kb DNA using 
the tethered-DNA assay. a, Nanomanipulation time-trace for positively 
supercoiled DNA (+sc) bearing a C-less cassette in the presence of 25 pM 
RNAP holoenzyme, 100nM Mfd, 50 pM UvrA, 250nM UvrB, 500nM 
GreB, 2mM ATP, 200 ,.M UTP and 200\1M GTP (continuous-tracking 
methodology). b, Nanomanipulation time-trace obtained as in a but for 
negatively supercoiled DNA (—sc). c-f, Lifetime distributions for the 
Mfd-RNAP intermediate in the presence of UvrA and UvrB as above are 
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essentially independent of both the cause of RNAP stalling (either a C-less 
cassette or a CPD) and supercoiling of the DNA (positive or negative). For 
positively supercoiled template the mean lifetime observed using DNA 
bearing a C-less cassette is 132 +7 (s.e.m., n= 210 (c), see overview 

in Extended Data Fig. 6d) and for DNA bearing a CPD it is 141 + 20s 
(s.e.m., n= 58 (e)). For negatively supercoiled template the mean lifetime 
observed using DNA bearing a C-less cassette is 132 + 13s (s.e.m., n= 65 
(d)) and for DNA bearing a CPD it is 157 + 65s (s.e.m., n= 10 (f)). 
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Extended Data Figure 6 | Lifetime distributions of the Mfd-RNAP 
intermediate as a function of UvrA concentration, using the tethered- 
DNA assay. The DNA substrate used in these experiments was positively 
supercoiled and contained a C-less cassette, and data were collected using 
the continuous-tracking methodology in the presence of 10-20 pM RNAP 
holoenzyme, 500 nM GreB, 100nM Mfd, 2mM ATP, 200,.M UTP, 200 1.M 
GTP and 250nM UvrB. The UvrA concentration was (a) 50 pM, (b) 75 pM 
and (c) 100 pM. Red lines show the result of global fitting to a 


difference-of-two exponentials characteristic of a Michaelis-Menten 
process, using the rate-limiting forward catalytic step extracted from 
classical Michaelian analysis of the mean times (Fig. 2d) as an additional 
constraint. d, Overview of lifetimes of the Mfd-RNAP complex measured 
with the tethered-DNA assay and as a function of template supercoiling, 
cause of RNAP stalling and UvrA concentration, as presented in this paper. 
UvrB was fixed at 250 nM throughout. 
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Extended Data Figure 7 | See next page for caption. 
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Extended Data Figure 7 | Mfd-RNAP-UvrAB control experiments, 
using the tethered-DNA assay. These experiments were performed on 
positively supercoiled DNA substrate (+sc) bearing a C-less cassette and 
using the standard pulse-chase methodology. a, No ATP control: ATP 
dependence of UvrAB remodelling of Mfd-RNAP intermediate. Black 
down-arrows indicate component infusion as follows. RNAP: we first 
introduce 25 pM RNAP holoenzyme, 500 nM GreB, 200 1M ATP, 


200 1M UTP and 200\1M GTP, and wait for RNAP to stall on DNA. Wash: 


we next wash out all free components except for GreB. Mfd ATP: we next 
infuse 100nM Mfd, 500 nM GreB and 501M ATP and wait for Mfd to 
remodel RNAP and form the Mfd-RNAP intermediate. Wash: we next 
wash out all free components except GreB. UvrAB: we next infuse 

50 pM UvrA, 250nM UvrB and 500 nM GreB. We wait several thousand 


seconds, without any observed change in the intermediate state. 

ATP: finally, we infuse 2mM ATP into the reaction and rapidly observe 
resolution of the intermediate species. b, No Mfd control: UvrAB does 
not functionally interact with RNAP in the absence of Mfd. Stalled RNAP 
formed as in a is not displaced in the presence of (down-arrow) 50 pM 
UvrA, 250nM UvrB, 500nM GreB and 2mM ATP. ¢c, No UvrA control: 
lifetime distribution for the Mfd-RNAP intermediate in the presence 
of UvrB alone. Stalled RNAP is formed as in a. We then wash out free 
RNAP while maintaining GreB and NTPs in solution, and add 100 nM 
Mfd, 250nM UvrB and 2mM ATP while maintaining GreB and NTPs 
in solution. The lifetime of the Mfd-RNAP intermediate thus formed 
remains long-lived (642 + 22s (s.e.m.), m= 80) with Gaussian statistics. 
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Extended Data Figure 8 | See next page for caption. 
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Extended Data Figure 8 | Control experiments for UvrA and UvrB 
interactions with DNA in the absence of damage as seen in the 
tethered-DNA supercoiling assay. Experiments were conducted on 
positively supercoiled DNA bearing a CTP-less cassette. a, UvrA alone 
compacts undamaged, supercoiled DNA in a non-specific manner even 

at concentrations as low as 10 pM. Trace shown obtained with 1 mM ATP; 
the same phenomenon is observed in the absence of ATP (data not 
shown). b, UvrB prevents non-specific interaction of UvrA with 

DNA; (t=0s) 250nM UvrB alone does not compact DNA, although it 
transiently interacts non-specifically and briefly with DNA in the presence 
of 1mM ATP (see c-f). The same phenomenon is observed in the absence 
of ATP (data not shown); (f= 2,000 s) addition of UvrB also prevents UvrA 
from compacting DNA non-specifically. On the basis of these data we 

set the working UvrB concentration to 250 nM: our measurements with 
UvrA can thus go up to 100 pM, which remains more than 90% saturated 


by this concentration of UvrB as shown by the fact that we can perform 
measurements without DNA compaction. c, d, Time-traces obtained on 
positively supercoiled DNA in the presence of 250nM UvrB and 1mM 
ATP show supercoiling-dependence of the dwell time (tawen) of UvrB-DNA 
‘wrapping’ events. Indeed the amplitude of these events (~50-100 nm) 

is consistent with titration of a large positive supercoil by formation of a 
tight/compact, positive wrap of DNA around UvrB as observed in AFM 
imaging”’. e, f, Histograms of the dwell time of the wrap state obtained 
above are fitted to single-exponential distributions, with a mean dwell time 
of (e) 28+2s (s.e.m., m= 175, +5 turns), and (f) 66 +5s (s.e.m., n= 117, 
+6 turns). By performing experiments with no more than 250nM UvrB 
and with only +4 turns of positive supercoiling, this wrap state is of order 
10s and does not significantly interfere with detection of Mfd-RNAP 
intermediates or their resolution, and UvrB safely inhibits DNA compaction 
activity by UvrA. 
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Extended Data Figure 9 | Correlation between resolution of the Mfd- 
RNAP intermediate and loss of fluorescent signal from labelled RNAP 
or Mfd in the NanoCOSM assay. We plot the time elapsed between loss of 
fluorescence signal from (a) fluorescent RNAP or (b) fluorescent Mfd and 
nanomechanical resolution of the Mfd-RNAP intermediate as observed 

in the magnetic trap, as shown in Fig. 3a, b. In both cases the vast majority 
of events are correlated as shown by the fact that loss of fluorescence and 
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nanomechanical resolution of the intermediate temporally coincide (that 
is, the time between the two events is nil). Loss of fluorescence before 
nanomechanical resolution (that is, indicated as positive times) is most 
probably due to spontaneous photobleaching of the DY-549 fluorophore 
used to label proteins. No significant difference is observed between DNA 
substrates bearing a CPD or bearing a C-less cassette. 
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Extended Data Figure 10 | See next page for caption. 
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Extended Data Figure 10 | Characterization of specific, control and 
non-specific GGR incision using the tethered-DNA assay. a, b, Time- 
traces showing GGR incision on positively and negatively supercoiled 
CPD-bearing DNA. For positively supercoiled DNA (+sc), addition of 
UvrABC proteins (1nM UvrA, 250nM UvrB, 100 pM UvrC and 

100 pM pUC18 competitor DNA) and 1mM ATP led to DNA incision 
and an abrupt loss of supercoiling. The average GGR incision times are 
1,230 + 195s (s.e.m., n= 72 events) and 1,156 + 256s (s.e.m., n= 40 
events) for +sc and —sc, respectively; see Fig. 3f, g for distributions and 
fits. c, As in a, but in the absence of ATP. The absence of incision was 
confirmed on 22 molecules over a ~4h window. Upon supplementing 

the reactions with 1 mM ATP (red down-arrow) incision rapidly takes 
place. d, UvrC (100 pM) and ATP (1 mM) are unable to incise positively 
supercoiled, CPD-bearing DNA. The absence of incision was confirmed 
on 31 molecules over a ~2 h window. e, Incision times for UvrABC (as 
above) acting on positively supercoiled DNA bearing a C-less cassette (that 
is, undamaged) are essentially normally distributed (red line) with a mean 
of 2,922 + 222s (n= 44 events; the fit was obtained by excluding points 
between 3,000 and 4,000s). f, Incision times for UvrABC (as above) acting 
on negatively supercoiled DNA bearing a C-less cassette are essentially 
normally distributed with a mean of 2,471 + 377s (s.e.m., n= 28 events; 
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the fit was obtained by excluding points below 1,000). g, Incision 

times for UvrABC acting on positively supercoiled DNA bearing a CPD 
protected by stalled RNAP in the absence of Mfd are essentially normally 
distributed with a mean of 2,348 + 672s (s.e.m., m = 31 events). Red lines 
are guides to the eye (c, d) and overall results confirm all of UvrAB, UvrC 
and ATP are required for GGR incision. Results from e-g further indicate 
that non-specific DNA incision by the complete GGR system can take 
place in this assay; however, it is slow enough to permit measurement 

of faster specific incision rates discussed in Fig. 3. We propose these 
incision events are in fact specific to the multiple biotin- and digoxigenin- 
based tethers at the ends of the DNA construct, which can ultimately 

be recognized as DNA damage by the GGR machinery’. Because only 
nicking at the first tethering biotin or dig, and within the 2 kbp fragment, 
will result in loss of supercoiling, then, statistically, multiple incisions at 
multiple tethers must be realized before loss of supercoiling, resulting 

in a normal distribution. This can be compared to DNA incision by 
endonuclease, the time distribution of which is single-exponential*®. 

As incision on these constructs is significantly slower and obeys different 
statistics than that on CPD-containing DNA, we conclude that our 
single-molecule measurements can indeed isolate CPD-specific from 
non-specific incision by the GGR machinery. 
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Corrigendum: Dietary emulsifiers 
impact the mouse gut microbiota 
promoting colitis and metabolic 
syndrome 


Benoit Chassaing, Omry Koren, Julia K. Goodrich, 
Angela C. Poole, Shanthi Srinivasan, Ruth E. Ley & 
Andrew T. Gewirtz 


Nature 519, 92-96 (2015); doi:10.1038/naturel14232 


Some clarifications are provided to this Letter; these do not alter any of 
the central conclusions but, rather, are provided in the interests of trans- 
parency and reproducibility. Our Letter indicated that experiments 
were performed on 4-week-old mice (unless stated otherwise). In fact, 
for several experiments, mice ranged from 5 to 7 weeks as follows: 
Fig. 4a-h, Extended Data Fig. 9g—w, b’-t’, z: 5 weeks old; Figs 1, 2, 3a—d, 
4i-o, Extended Data Figs la-d, 2, 4, 5s—v, 6, 7h-k, 8l-s, 9a-f, x, y, 10: 
6 weeks old; Fig. 3e-1, Extended Data Fig. le-l, 5q, 1, 7a-g, 1-h’, 8a—k, 
t-o’, 9a’: 7 weeks old. The weight gain versus time curves are affected 
by mouse age and hence explain why the kinetics of weight gain differ 
among control mice when comparing between different experiments. 

For each experiment, we listed the average n value for all the con- 
ditions within each panel, which differed from the exact n for each 
experimental condition within each figure, as shown in Supplementary 
Table 1 of this Corrigendum. 

Furthermore, our Letter reported relative changes in body mass and 
absolute mass of fat pads, thus not permitting assessment of absolute 
weight changes nor fat pad mass relative to total body mass. Hence, we 
provide measures of absolute and relative body and fat pad mass in a 
side-by-side manner in the Supplementary Data to this Corrigendum. 


Supplementary Information is available in the online version of this Corrigendum. 
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human nephrogenesis 
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Barbara Maier, Gregory J. Baillie, Charles Ferguson, 
Robert G. Parton, Ernst J. Wolvetang, Matthias S. Roost, 
Susana M. Chuva de Sousa Lopes & Melissa H. Little 


Nature 526, 564-568 (2015); doi:10.1038/nature15695 


In the Methods of this Letter, ‘5,000’ should have read ‘15,000’ in 
the sentence: “Then, cells were again plated on a Matrigel-coated at 
15,000 cells per cm? in MEF-CM” This error has been corrected online. 
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Universal resilience patterns in 
complex networks 
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Nature 530, 307-312 (2016); doi:10.1038/nature16948 


In the last sentence of page 310 of this Letter, the parameter h should 
equal 2, rather than 1. In addition, after equation (4), the text should 
have stated ‘A, > 0’ and ‘positive interactions, to read “..the weighted 
connectivity matrix Aj > 0 captures the positive interactions between 
the nodes. These errors have been corrected online. 
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Ecology is one of a few fields moving towards the multiple-working-hypotheses method of investigation. 


RESEARCH PROTOCOLS 


A forest of 
hypotheses 


Falling in love witha single theory can cut off fruitful 
avenues of enquiry. Here’s how to keep your mind open. 


BY JULIA ROSEN 


he clamour in a Panamanian rainforest 

is deafening to human ears: bugs shriek, 
birds sing and bats screech throughout 

the humid night. To avoid attracting predators, 
male katydids ( Tettigoniidae) trill out short, 
infrequent mating calls less than a second long. 
Postdoc Laurel Symes, who studies sensory 
perception and decision-making at Dartmouth 
College in Hanover, New Hampshire, wants to 


understand how female katydids find their 
mates. She first thought they must have highly 
sensitive hearing. But she juggles other ideas 
at the same time: maybe katydids always meet 
up on a certain type of host plant, have neural 
mechanisms that filter out background noise or 
use another trick entirely. 

These aren't just idle musings: Symes’s 
collection of hypotheses is an integral part of her 
research. The approach helps her to home in on 
answers and avoid investment in a sole idea — a 


common tendency in science that can lead to 
trouble. History contains numerous examples of 
scientists who missed important clues because 
they clung too tightly to a favourite hypothesis. 
One way to avoid this fate is to consider many 
potential hypotheses. 

Proponents of the multiple-working-hypoth- 
eses method say that it prevents scientists from 
developing ‘tunnel vision, and enables them to 
embrace the possibility that several hypotheses 
might be true at once. Practising the approach 
takes discipline: researchers must brainstorm 
possible explanations for a scientific phenom- 
enon before collecting or analysing data, and 
use techniques such as scrambling the order of 
samples and blinding data to help to counter- 
act favouritism. It also demands that scientists 
remain open-minded during the entire research 
process, and continually refine their hypotheses. 


ALONG HISTORY 

The method of multiple working hypotheses 
was formally articulated' in 1890 by geologist 
Thomas Chrowder Chamberlin, then president 
of the University of Wisconsin—Madison. Build- 
ing on the ideas of fellow geologist Grove Karl 
Gilbert, Chamberlin warned that when scien- 
tists come up with an original idea, they tend to 
develop affection for it, which can cloud their 
ability to do objective work. He argued that the 
solution was to generate and explore a family of 
hypotheses. By coming up with alternatives, he 
suggested, scientists would not be inclined to 
favour one idea. 

Although the concept has faced criticism, 
aimed mainly at the impossibility of conceiv- 
ing — let alone testing — all possibilities, many 
scientists say that it is as relevant today as ever. 
The pressure to publish in high-profile journals, 
win grants and build a reputation can prompt 
researchers — consciously or not — to seek 
support for pet ideas. One study posted to the 
preprint server arXiv’ in June found that when 
programmers introduced these kinds of incen- 
tives into a model, simulated research groups 
succumbed to pressures to show support for 
original ideas, often erroneously. 

Ecologist Barry Brook of the University of 
Tasmania in Australia thinks that resurrecting 
Chamberlin’s ideas could help. In 2007, he co- 
authored a paper on the merits of using multi- 
ple working hypotheses for twenty-first-century 
science’. In many cases, he argues, the method 
produces more insightful results than testing 
null hypotheses, which reveals only whether a 
specific factor has a discernible effect. Multiple 
hypotheses, by contrast, can help scientists > 
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> towork out whether that effect is important, 
and whether several factors might be at play. 

Brook, for example, wanted to know why 
small mammals such as brown bandicoots 
(Isoodon macrourus) were disappearing from 
Northern Australia’s Kakadu National Park. 
Many scientists had pointed in the past to 
introduced predators, such as cats, which 
seemed plausible. But when he considered other 
hypotheses and looked at historical population 
data, he found that cats had a negligible role, and 
that intense wildfires bore most blame’. “You 
can be surprised at how little support most of 
your well-crafted hypotheses can have,’ he says. 

It might seem simpler to consider just one 
possible explanation, but ignoring other mod- 
els can be dangerous. “That's not only dishonest, 
but it will also lead you down bad inferential 
pathways,” Brook says. 


RESIST TEMPTATION 

It can be challenging to put the method into 
practice because researchers must battle their 
own natural enthusiasm for an alluring idea. 
The first step is to set aside time to articulate 
other hypotheses before one starts to gain trac- 
tion. Ifnot, a favoured hypothesis might skew 
the process of data collection or analysis when 
one heads out into the field, starts an experi- 
ment or dives into a data set. “If you have a 
hypothesis or you're looking for a pattern, some- 
times you won't actually honour what pattern is 
there,’ says Kathleen Nicoll, a geographer at the 
University of Utah in Salt Lake City. 

When coming up with a collection of 
hypotheses, it can be helpful to have patience 
and consult labmates — and to include a seem- 
ingly outrageous hypothesis. This idea was first 
advocated in 1926 by William Morris Davis, a 
retired geologist from Harvard University in 
Cambridge, Massachusetts, as a way to break 
out of conventional thinking. Many notable sci- 
entific advances fall into this category, including 
Alfred Wegener's then-scandalous claim in 1912 
that continents migrate across Earth’s surface” 
(they do), and the heretical proposal, developed 
in the 1920s by geologist J Harlen Bretz, that a 
catastrophic flood scoured out the heavily chan- 
neled landscapes of Washington (in fact, many 
violent floods swept through the region). 

Symes finds that using multiple hypotheses 
yields the best results if researchers generate 
ideas that rely on different processes and make 
distinct predictions. In her research, a host- 
plant preference might lead to katydids hav- 
ing the same food in their guts, whereas using 
sound might imply that female katydids in 
Panama have more sensitive ears than species 
in forests without predatory bats. By identifying 
possible outcomes, she can design her experi- 
ments in ways that help to distinguish these 
ideas. “If the hypotheses are mutually exclusive 
or different in their mechanism, then you are 
going to learn something,’ she says. 

Consideration of multiple working hypoth- 
eses continues during data processing and 


analysis, when scientists must take other steps 
to protect their objectivity (see ‘Don’t play 
favourites’). 

For Lydia Tackett, who studies marine fossils 
at North Dakota State University in Fargo, the 
solution is as simple as analysing samples out 
of order. Working chronologically through a 
geological sequence led her to identify trends 
prematurely and anticipate what she would find 
in subsequent layers. “Now, I collect the bulk 
samples I need and randomize the order,” she 
says. She codes them so that she doesn’t know 
exactly which layer each sample came from. 

Others rely on statistical tools. Instead of 
using P-values to reject individual models one at 
atime, Trevor Branch, a fisheries scientist at the 
University of Washington in Seattle, embraces 
a model-selection technique called Akaike’s 
information criterion (AIC). This statistical 
method determines which of a set of models 
best explains data collected about an often- 
complex system. Branch says that it’s a math- 
ematical way of implementing Chamberlin’s 
method of multiple working hypotheses. 

Brook uses the AIC as well as the similar 
Bayesian information criterion, which is useful 
for distinguishing between a few simple mod- 
els. When several models seem to be true, these 
methods help to weight their relative impor- 
tance, so that their combined effects can be 
explored through something called multimodel 
inference. That involves merging several differ- 
ent models and considering them simultane- 
ously to explain as much as possible. 

Physicists and astronomers often take 
extreme measures to prevent researcher bias 


PET IDEAS 
Don’t play favourites 


To apply the multiple-working- 
hypotheses method, try these tips: 


@ Devise a list of possible hypotheses 
before collecting or looking at new data. 
@ Talk to colleagues and try to challenge 
your assumptions by creating at least 
one outrageous hypothesis. 

@ To learn most efficiently, develop 
hypotheses that are as distinct from each 
other as possible. 

@ Use analytical techniques that block 
you from developing preliminary ideas 
about what your data are telling you. 
This could include analysing samples 
out of order, blinding your data or using 
different statistical tests. 

@ Before looking at your data, try to 
articulate all possible outcomes, and how 
you would test and differentiate each one. 
@ Keep in mind that a null result is nota 
failure but rather an additional piece of 
information. J.A. 


240 | NATURE | VOL 536 | 11 AUGUST 2016 
© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


from creeping into their analyses. Saul Perlmut- 
ter, an astrophysicist at the University of Califor- 
nia, Berkeley, relies on software or colleagues to 
hide potentially telling clues in the data before 
he sees them, a technique called blind analysis. 
This might include adding randomly gener- 
ated numbers to data values, shifting them by 
random amounts or hiding the axes on a graph. 
The goal is to make sure that the researchers 
don't see anything that could prime their minds 

towards a particular 


“If the interpretation, such as 
hypotheses a preliminary trend or 
are mutually hint ofa discovery. 

exclusive, Before unblinding 
you are going data, scientists on Per- 
to learn Imutter’s team must 


circulate a memo 
explaining their 
hypotheses and how they plan to test and differ- 
entiate between them. “Everybody can decide 
ahead of time whether that feels fair — that they 
haven't treated any of the alternatives differently 
than the others,” he says. Last year, Perlmutter 
and psychologist Robert MacCoun of Stanford 
University in California argued in a Nature 
Comment’ that this approach could reduce 
researcher bias in many fields. 

Of course, there are situations in which 
multiple hypotheses aren't helpful — or even 
feasible. If researchers stumble on a mysteri- 
ous finding, they might struggle to come up 
with even a single plausible explanation. And 
even if they can cobble together a few, there 
is no guarantee that the correct hypothesis is 
among them. This is why hypotheses must 
remain ‘working; so that they can be refined 
in light of new information. 

Other situations present the opposite 
challenge: too many hypotheses. Freya 
Blekman, an experimental physicist at the 
Dutch-speaking Free University of Brussels, 
searches for elementary particles at facilities 
such as the Large Hadron Collider at CERN, 
Europe's particle-physics lab near Geneva, 
Switerland. In her field, theorists have already 
posited countless possibilities, and her task is 
to work out which ones the evidence supports. 

Because these models are often mutually 
exclusive, she typically evaluates them one at a 
time using P values — albeit held to an excep- 
tionally high standard of significance. In fields 
such as psychology and medicine, there is a 
growing movement to abandon this technique 
because it can tempt researchers to seek out 
analytical approaches that produce significant 
results. But Blekman says that the physics com- 
munity has largely eliminated this problem 
through blinding and by creating a culture so 
steeped in the ethos of multiple hypotheses 
that finding nothing is as important as finding 
something. “In our field, a null result is a valu- 
able result,” she says. 

Indeed, the method of multiple hypotheses 
doesn't always have to be practised at the indi- 
vidual level, and can take place across entire 


something.” 


fields. Different groups can advance various 
hypotheses, as long as they remain open- 
minded, and the peer-review process can 
also help to promote the practice. “I think 
we have a duty as editors and reviewers to 
bring up alternatives,” says Branch, “and to 
require authors that come up with a new 
hypothesis to also include alternatives when 
they bring it up the first time around”. 

Regardless of how they apply the method, 
many researchers say that they stumbled 
across the idea of multiple hypotheses by 
accident, as graduate students or later. 
Branch had never heard of the concept until 
a few years ago, but was so struck by it that 
he wrote an article last year arguing that 
researchers should not seek a single, uni- 
versal explanation for how fisheries affect 
marine food webs, but should consider how 
different models might apply in various parts 
of the world’. 

A few researchers say that their advisers 
encouraged them to read classic philosophy- 
of-science texts, such as Thomas Kuhn's 
Structure of Scientific Revolutions (Univ. Chi- 
cago Press, 1962), or fostered discussions on 
the practical side of the scientific method at 
lab meetings. But many scientists can make it 
through their entire careers without any for- 
mal training in how to develop hypotheses. 

That’s too bad, because learning and 
applying the multiple-hypothesis method 
can improve the calibre of scientists’ work 
and empower scientists themselves, says 
Symes, who published a guide last year on 
teaching the research process’. “It always 
pains me to see students who define success 
and failure as whether they support a par- 
ticular hypothesis,” she says. “Failing is not 
collecting the data you need. Succeeding is 
being able to differentiate the possibilities” m 


Julia Rosen is a freelance writer in 
Portland, Oregon. 


1. Chamberlin, T. C. Science 15, 92-96 (1890). 

2. Smaldino, P. E. & McElreath, R. Preprint at 
https://arxiv.org/abs/1605.09511 (2016). 

3. Elliott, L. P & Brook, B. W. BioScience 57, 
608-614 (2007). 

4. Pardon, L. G., Brook, B. W., Griffiths, A. D. & 
Braithwaite, R. W. J. Animal Ecol. 72, 106-115 
(2003). 

5. Wegener, A. Petermanns Geogr. Mitt. 58, 
185-185, 253-256, 305-309 (1912). 

6. MacCoun, R. & Perlmutter, S. Nature 526, 
187-189 (2015). 

7. Branch, T. A. Fisheries 40, 373-375 (2015). 

8. Symes, L. B., Serrell, N. & Ayres, M. P. Bull. Ecol. 
Soc. Am. 96, 352-367 (2015). 


CORRECTION 

The Careers Feature ‘Partners in 
knowledge’ (Nature 535, 581-582; 
2016) mistakenly attributed the tradition 
of depicting unusual events on buffalo 
hides to the Great Lakes region. It is 
actually a Great Plains tradition. 
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TURNING POINT 


Planet navigator 


Chikako Hirose, an aerospace engineer for the 
Japan Aerospace Exploration Agency (JAXA), 
led the team that steered the Akatsuki probe into 
orbit around Venus on 7 December 2015. She 
has directed Japan’ only successful planetary 
mission so far, recovering the spacecraft from a 
failed insertion attempt in 2010. 


What led you to become an aerospace 
engineer? 

When I was nine years old, I learned from my 
schoolteacher that human beings had been to 
the Moon. I became curious about space. At 
15, I sent out letters to many laboratories at 
NASA, asking for advice on how to get involved 
in space-related activities. I got lucky — one 
retired engineer from NASA's Goddard Space 
Flight Center replied. He told me to study hard 
in chemistry, physics and mathematics. When 
I was 19, JAXA announced that 20 students 
would be selected to attend the 50th Inter- 
national Astronautical Congress in Amsterdam, 
which I applied for. The opportunity eventually 
led to an official job offer from JAXA. 


Why were you in the control room when 
Akatsuki failed to enter Venus’s orbit in 2010? 
I wanted to get involved in deep-space missions. 
I would go to the Akatsuki project room every 
day just to see if there was something I could do. 
Mostly, I just listened. The spacecraft was pass- 
ing behind Venus when it was set to enter orbit, 
so we couldnt receive continuous signals. When 
the predicted time came, we didn’t receive any- 
thing. One second passed, two, three — after 
15 seconds, people were whispering, “What is 
happening to Akatsuki?” We found out that 
the main engine hadnt fired as planned, so the 
spacecraft had gone into safe mode and was 
tumbling. You could see the disappointment 
on the faces of the scientists. 


How did you end up leading the recovery? 

I had done work analysing space debris and 
estimating its close approach to satellites. This 
experience made me an expert in trajectory 
and orbital analysis. We determined, on the 
basis of the gravity of the Sun and Venus, that 
Akatsuki would only re-encounter Venus five 
years later. We tried to preserve the spacecraft 
as best we could. Its design life was just two 
and a half years. 


What was the key constraint in designing 
Akatsuki’s new trajectory? 

The spacecraft’s orbit had become very long 
and elliptical — 370,000 kilometres at its 
farthest distance from Venus (similar to the 


distance between Earth and the Moon) and 
400 kilometres at its closest. At its farthest 
point, the spacecraft could take more than 
ten hours to pass through the planet’s shadow. 
But Akatsuki’s solar-charged batteries last 
for less than two hours. We had to adjust the 
spacecraft’s orbit several times over five years 
and perform a manoeuvre so as not to exceed 
Akatsuki’s battery life. 


How confident were you that the mission 
would succeed? 

I still didn’t know whether Akatsuki’s engines 
really worked. Our initial plan was to use the 
four engines on one side. If they failed, we were 
prepared to rotate the spacecraft 180 degrees 
to use the four engines on the other side. 
We were closely monitoring the velocity of 
the spacecraft, and saw that the change was 
exactly as expected. We knew that Akatsuki 
had entered into orbit around Venus. 


How did you celebrate? 

In 2010, we had made preparations to 
celebrate, but failed. In 2015, I had brought a 
bottle of champagne with me, but didn't tell 
any of my colleagues until after the operation 
was complete. We opened the bottle and 
drank it together. 


Are you still involved with Akatsuki? 

Yes. I am still responsible for controlling Akat- 
suki’s orientation with respect to Venus, which 
changes almost every hour when the craft is 
closest to the planet. I also have to ensure that 
the spacecraft is oriented correctly for down- 
linking its observation data to Earth. We 
expect Akatsuki to survive another five years 
before crashing into Venus. m 


INTERVIEW BY SMRITI MALLAPATY 


This interview has been edited for length and clarity. 


11 AUGUST 2016 | VOL 536 | NATURE | 241 
acmillan Publishers Limited, part of Springer Nature. All rights reserved. 


fields. Different groups can advance various 
hypotheses, as long as they remain open- 
minded, and the peer-review process can 
also help to promote the practice. “I think 
we have a duty as editors and reviewers to 
bring up alternatives,” says Branch, “and to 
require authors that come up with a new 
hypothesis to also include alternatives when 
they bring it up the first time around”. 

Regardless of how they apply the method, 
many researchers say that they stumbled 
across the idea of multiple hypotheses by 
accident, as graduate students or later. 
Branch had never heard of the concept until 
a few years ago, but was so struck by it that 
he wrote an article last year arguing that 
researchers should not seek a single, uni- 
versal explanation for how fisheries affect 
marine food webs, but should consider how 
different models might apply in various parts 
of the world’. 

A few researchers say that their advisers 
encouraged them to read classic philosophy- 
of-science texts, such as Thomas Kuhn's 
Structure of Scientific Revolutions (Univ. Chi- 
cago Press, 1962), or fostered discussions on 
the practical side of the scientific method at 
lab meetings. But many scientists can make it 
through their entire careers without any for- 
mal training in how to develop hypotheses. 

That’s too bad, because learning and 
applying the multiple-hypothesis method 
can improve the calibre of scientists’ work 
and empower scientists themselves, says 
Symes, who published a guide last year on 
teaching the research process’. “It always 
pains me to see students who define success 
and failure as whether they support a par- 
ticular hypothesis,” she says. “Failing is not 
collecting the data you need. Succeeding is 
being able to differentiate the possibilities” m 


Julia Rosen is a freelance writer in 
Portland, Oregon. 


1. Chamberlin, T. C. Science 15, 92-96 (1890). 

2. Smaldino, P. E. & McElreath, R. Preprint at 
https://arxiv.org/abs/1605.09511 (2016). 

3. Elliott, L. P & Brook, B. W. BioScience 57, 
608-614 (2007). 

4. Pardon, L. G., Brook, B. W., Griffiths, A. D. & 
Braithwaite, R. W. J. Animal Ecol. 72, 106-115 
(2003). 

5. Wegener, A. Petermanns Geogr. Mitt. 58, 
185-185, 253-256, 305-309 (1912). 

6. MacCoun, R. & Perlmutter, S. Nature 526, 
187-189 (2015). 

7. Branch, T. A. Fisheries 40, 373-375 (2015). 

8. Symes, L. B., Serrell, N. & Ayres, M. P. Bull. Ecol. 
Soc. Am. 96, 352-367 (2015). 


CORRECTION 

The Careers Feature ‘Partners in 
knowledge’ (Nature 535, 581-582; 
2016) mistakenly attributed the tradition 
of depicting unusual events on buffalo 
hides to the Great Lakes region. It is 
actually a Great Plains tradition. 


© 2016 


TURNING POINT 


Planet navigator 


Chikako Hirose, an aerospace engineer for the 
Japan Aerospace Exploration Agency (JAXA), 
led the team that steered the Akatsuki probe into 
orbit around Venus on 7 December 2015. She 
has directed Japan’ only successful planetary 
mission so far, recovering the spacecraft from a 
failed insertion attempt in 2010. 


What led you to become an aerospace 
engineer? 

When I was nine years old, I learned from my 
schoolteacher that human beings had been to 
the Moon. I became curious about space. At 
15, I sent out letters to many laboratories at 
NASA, asking for advice on how to get involved 
in space-related activities. I got lucky — one 
retired engineer from NASA's Goddard Space 
Flight Center replied. He told me to study hard 
in chemistry, physics and mathematics. When 
I was 19, JAXA announced that 20 students 
would be selected to attend the 50th Inter- 
national Astronautical Congress in Amsterdam, 
which I applied for. The opportunity eventually 
led to an official job offer from JAXA. 


Why were you in the control room when 
Akatsuki failed to enter Venus’s orbit in 2010? 
I wanted to get involved in deep-space missions. 
I would go to the Akatsuki project room every 
day just to see if there was something I could do. 
Mostly, I just listened. The spacecraft was pass- 
ing behind Venus when it was set to enter orbit, 
so we couldnt receive continuous signals. When 
the predicted time came, we didn’t receive any- 
thing. One second passed, two, three — after 
15 seconds, people were whispering, “What is 
happening to Akatsuki?” We found out that 
the main engine hadnt fired as planned, so the 
spacecraft had gone into safe mode and was 
tumbling. You could see the disappointment 
on the faces of the scientists. 


How did you end up leading the recovery? 

I had done work analysing space debris and 
estimating its close approach to satellites. This 
experience made me an expert in trajectory 
and orbital analysis. We determined, on the 
basis of the gravity of the Sun and Venus, that 
Akatsuki would only re-encounter Venus five 
years later. We tried to preserve the spacecraft 
as best we could. Its design life was just two 
and a half years. 


What was the key constraint in designing 
Akatsuki’s new trajectory? 

The spacecraft’s orbit had become very long 
and elliptical — 370,000 kilometres at its 
farthest distance from Venus (similar to the 


distance between Earth and the Moon) and 
400 kilometres at its closest. At its farthest 
point, the spacecraft could take more than 
ten hours to pass through the planet’s shadow. 
But Akatsuki’s solar-charged batteries last 
for less than two hours. We had to adjust the 
spacecraft’s orbit several times over five years 
and perform a manoeuvre so as not to exceed 
Akatsuki’s battery life. 


How confident were you that the mission 
would succeed? 

I still didn’t know whether Akatsuki’s engines 
really worked. Our initial plan was to use the 
four engines on one side. If they failed, we were 
prepared to rotate the spacecraft 180 degrees 
to use the four engines on the other side. 
We were closely monitoring the velocity of 
the spacecraft, and saw that the change was 
exactly as expected. We knew that Akatsuki 
had entered into orbit around Venus. 


How did you celebrate? 

In 2010, we had made preparations to 
celebrate, but failed. In 2015, I had brought a 
bottle of champagne with me, but didn't tell 
any of my colleagues until after the operation 
was complete. We opened the bottle and 
drank it together. 


Are you still involved with Akatsuki? 

Yes. I am still responsible for controlling Akat- 
suki’s orientation with respect to Venus, which 
changes almost every hour when the craft is 
closest to the planet. I also have to ensure that 
the spacecraft is oriented correctly for down- 
linking its observation data to Earth. We 
expect Akatsuki to survive another five years 
before crashing into Venus. m 


INTERVIEW BY SMRITI MALLAPATY 


This interview has been edited for length and clarity. 


11 AUGUST 2016 | VOL 536 | NATURE | 241 
acmillan Publishers Limited, part of Springer Nature. All rights reserved. 


Ua SCIENCE FICTION 


WALLS OF NIGERIA 


BY JEREMY SZAL 


Lagos through the visor of my 

exosuit as I stalk down the hill. 
Buildings crumble and slide into 
the sea. Coils of fiery smoke curl up 
to the sky. So much work, so much 
craftsmanship. Gone in weeks. 

I’m panting as I continue 
down the hill — with the 
cooling system broken, I’m 
swimming in sweat inside 
this thing. It’s gunmetal 
grey, covering me from 
sole to scalp and weighing 
several hundred kilos. If it 
werent for the hydrau- 
lics built along my spine, 
moving in it would be 
impossible. I have to make 
extra effort to control it now; the suit seems 
to have a mind of its own. Cancelling my 
HUD commands, seizing up at random 
intervals, cutting off my sensory details. 

I’m nearing the school I used to attend, 
years before any of this happened. A few 
lone palm trees remain, fronds swaying in 
the sour wind. I remember being in class one 
stifling Tuesday, me and Tendai trying to 
sneak out when we first heard wed captured 
one of the K Dasewh. After all these years, 
wed finally got an alien. 

There are remains of a solider over by the 
school. An art mural covers the wall, unfin- 
ished words scrawled on blasted brick the 
colour of red earth. Chalk lies strewn on the 
ground. Even though his armour has been 
cracked open, it still pulses with blue bio- 
luminescence. The suit had grown into his 
flesh like a graft, the metal and matte and 
wires worming through his dark skin like 
tendrils. I step over empty coconut shells to 
check his suit’s reading to see when he died. 
Almost three months ago. Hed been wear- 
ing his suit for only two months and he’s this 
far gone. 

I’ve been inside mine for two years. 

My skin crawls with the memory of 
being locked into our suits of armour, laced 
with alien DNA. They’ dissected these 
aliens, taken the self-healing and enhanced 
strength in their biotech and transferred it 

to us. For a while, it 


[= at the twisted remains of 
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us, it needs living tissue. You can't get bio- 
mass from nothing. So the suit slowly grew 
inwards into flesh, tunnelling through open 
wounds and organs damaged from battle. 
Fusing into the wearer. The quarantine came 
around too late. 

I wonder if I have any flesh left, if the 
cables have wrapped around my bones like 
creepers around a tree. If it’s started corrod- 
ing my brain, trying to take complete control 
of the suit. Tightening its grip by the day. But 
I have no way of knowing. And that scares 
me the most. 

Over in the distance, there's a biosphere 
laid out over the ground — where some of 
the last human settlements still reside. We're 
not allowed within five klicks of them for 
risk of infection. They’re still getting refu- 
gees from Ghana and Cameroon, but most 
of them have already been placed on off- 
world colonies and habitable planets outside 
the Solar System. 

My wife and sons are among them. Ben 
should be six years old now and Emeka 
eight, maybe nine. 

These are just the last few that have lin- 
gered behind on Earth to make sure that no 
one gets left behind. No one except us. 

I log into my commander’s channel. It 
takes me three tries to get it right; the suit 
attempts to cancel it. But I manage it. 

“You still out there, son?” The grizzled face 
of Commander Somadina pops into my bot- 
tom-right vision. “I thought you were dead.” 

I wish I was. I truly do. “I’m still here” 

“I wish I could help. But we can’t let any 
of you Stained inside the sphere. We can’t let 
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the biotech virus spread, especially not to 

the new colony.’ 

My jaws lock and my muscles tighten 
against my armour. “After everything we 
did?” I spread my arms, armour plates like 
fetters around my wrists. “We fought for 
this city with everything we had.” 

“And look what happened anyway.” He 
shakes his head. “They sent their entire 
fleet and destroyed it.” 

No matter what we did — how hard 
we fought — it wasn’t enough. By the 
time wed destroyed the last of their 

ships, our world was broken. 

I crane my neck to look at the 

sky. Somewhere, out in the 

giant cosmos of space, is 
my family. “At least let me 
talk to my wife one last 
time. Let me send a message.” 

“Cannot be done. We can't tell you where 
the colony is. What if they capture and tor- 
ture you? Besides, your armour will store the 
location. All other Stained are in the same 
position” 

I want to scream. I want to laugh like a 
madman. My throat’s filled with concrete 
and every word feels like it’s being fishhooked 
from my gut. Maybe now the suit has started 
consuming my throat and vocal cords. Soon 
I wont be able to speak. “So that’s it?” 

“Tm so sorry.’ He can't even look at me. 
“Goodbye, Kohban” 

He cuts the connection. Leaving me here, 
shackles worming deeper and deeper into 
my body. 

The weight of my armour and of the 
cosmos pressing down on my shoulders, I 
stagger to the wall and scoop up some chalk. 
Hands shaking, I scrawl a message to my 
friends and family, to the people of Nigeria. 
I do it quickly, before the armour locks up. 
Telling them that I miss them — that ’m 
part of this world now. That the K Dasewh 
will never have our planet. 

And one day, when my people return to 
a new, clean Earth, this message will greet 
them. I hope I’m not here when that hap- 
pens. 

My eyes blur. It could be tears, or could be 
the suit trying to obscure my vision. I don't 
think I'll ever know for sure. m 
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